1CRUSHTOOL(8) Ceph CRUSHTOOL(8)
2
3
4
6 crushtool - CRUSH map manipulation tool
7
9 crushtool ( -d map | -c map.txt | --build --num_osds numosds
10 layer1 ... | --test ) [ -o outfile ]
11
12
14 crushtool is a utility that lets you create, compile, decompile and
15 test CRUSH map files.
16
17 CRUSH is a pseudo-random data distribution algorithm that efficiently
18 maps input values (which, in the context of Ceph, correspond to Place‐
19 ment Groups) across a heterogeneous, hierarchically structured device
20 map. The algorithm was originally described in detail in the following
21 paper (although it has evolved some since then):
22
23 http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
24
25 The tool has four modes of operation.
26
27 --compile|-c map.txt
28 will compile a plaintext map.txt into a binary map file.
29
30 --decompile|-d map
31 will take the compiled map and decompile it into a plaintext
32 source file, suitable for editing.
33
34 --build --num_osds {num-osds} layer1 ...
35 will create map with the given layer structure. See below for a
36 detailed explanation.
37
38 --test will perform a dry run of a CRUSH mapping for a range of input
39 values [--min-x,--max-x] (default [0,1023]) which can be thought
40 of as simulated Placement Groups. See below for a more detailed
41 explanation.
42
43 Unlike other Ceph tools, crushtool does not accept generic options such
44 as --debug-crush from the command line. They can, however, be provided
45 via the CEPH_ARGS environment variable. For instance, to silence all
46 output from the CRUSH subsystem:
47
48 CEPH_ARGS="--debug-crush 0" crushtool ...
49
51 The test mode will use the input crush map ( as specified with -i map )
52 and perform a dry run of CRUSH mapping or random placement (if --simu‐
53 late is set ). On completion, two kinds of reports can be created. 1)
54 The --show-... option outputs human readable information on stderr. 2)
55 The --output-csv option creates CSV files that are documented by the
56 --help-output option.
57
58 Note: Each Placement Group (PG) has an integer ID which can be obtained
59 from ceph pg dump (for example PG 2.2f means pool id 2, PG id 32). The
60 pool and PG IDs are combined by a function to get a value which is
61 given to CRUSH to map it to OSDs. crushtool does not know about PGs or
62 pools; it only runs simulations by mapping values in the range
63 [--min-x,--max-x].
64
65 --show-statistics
66 Displays a summary of the distribution. For instance:
67
68 rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
69
70 shows that rule 1 which is named metadata successfully mapped
71 1024 values to result size == 5 devices when trying to map them
72 to num_rep 5 replicas. When it fails to provide the required
73 mapping, presumably because the number of tries must be
74 increased, a breakdown of the failures is displayed. For
75 instance:
76
77 rule 1 (metadata) num_rep 10 result size == 8: 4/1024
78 rule 1 (metadata) num_rep 10 result size == 9: 93/1024
79 rule 1 (metadata) num_rep 10 result size == 10: 927/1024
80
81 shows that although num_rep 10 replicas were required, 4 out of
82 1024 values ( 4/1024 ) were mapped to result size == 8 devices
83 only.
84
85 --show-mappings
86 Displays the mapping of each value in the range
87 [--min-x,--max-x]. For instance:
88
89 CRUSH rule 1 x 24 [11,6]
90
91 shows that value 24 is mapped to devices [11,6] by rule 1.
92
93 --show-bad-mappings
94 Displays which value failed to be mapped to the required number
95 of devices. For instance:
96
97 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
98
99 shows that when rule 1 was required to map 7 devices, it could
100 map only six : [8,10,2,11,6,9].
101
102 --show-utilization
103 Displays the expected and actual utilization for each device,
104 for each number of replicas. For instance:
105
106 device 0: stored : 951 expected : 853.333
107 device 1: stored : 963 expected : 853.333
108 ...
109
110 shows that device 0 stored 951 values and was expected to store
111 853. Implies --show-statistics.
112
113 --show-utilization-all
114 Displays the same as --show-utilization but does not suppress
115 output when the weight of a device is zero. Implies --show-sta‐
116 tistics.
117
118 --show-choose-tries
119 Displays how many attempts were needed to find a device mapping.
120 For instance:
121
122 0: 95224
123 1: 3745
124 2: 2225
125 ..
126
127 shows that 95224 mappings succeeded without retries, 3745 map‐
128 pings succeeded with one attempts, etc. There are as many rows
129 as the value of the --set-choose-total-tries option.
130
131 --output-csv
132 Creates CSV files (in the current directory) containing informa‐
133 tion documented by --help-output. The files are named after the
134 rule used when collecting the statistics. For instance, if the
135 rule : 'metadata' is used, the CSV files will be:
136
137 metadata-absolute_weights.csv
138 metadata-device_utilization.csv
139 ...
140
141 The first line of the file shortly explains the column layout.
142 For instance:
143
144 metadata-absolute_weights.csv
145 Device ID, Absolute Weight
146 0,1
147 ...
148
149 --output-name NAME
150 Prepend NAME to the file names generated when --output-csv is
151 specified. For instance --output-name FOO will create files:
152
153 FOO-metadata-absolute_weights.csv
154 FOO-metadata-device_utilization.csv
155 ...
156
157 The --set-... options can be used to modify the tunables of the input
158 crush map. The input crush map is modified in memory. For example:
159
160 $ crushtool -i mymap --test --show-bad-mappings
161 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
162
163 could be fixed by increasing the choose-total-tries as follows:
164
165 $ crushtool -i mymap --test
166 --show-bad-mappings --set-choose-total-tries 500
167
169 The build mode will generate hierarchical maps. The first argument
170 specifies the number of devices (leaves) in the CRUSH hierarchy. Each
171 layer describes how the layer (or devices) preceding it should be
172 grouped.
173
174 Each layer consists of:
175
176 bucket ( uniform | list | tree | straw | straw2 ) size
177
178 The bucket is the type of the buckets in the layer (e.g. "rack"). Each
179 bucket name will be built by appending a unique number to the bucket
180 string (e.g. "rack0", "rack1"...).
181
182 The second component is the type of bucket: straw should be used most
183 of the time.
184
185 The third component is the maximum size of the bucket. A size of zero
186 means a bucket of infinite capacity.
187
189 Suppose we have two rows with two racks each and 20 nodes per rack.
190 Suppose each node contains 4 storage devices for Ceph OSD Daemons. This
191 configuration allows us to deploy 320 Ceph OSD Daemons. Lets assume a
192 42U rack with 2U nodes, leaving an extra 2U for a rack switch.
193
194 To reflect our hierarchy of devices, nodes, racks and rows, we would
195 execute the following:
196
197 $ crushtool -o crushmap --build --num_osds 320 \
198 node straw 4 \
199 rack straw 20 \
200 row straw 2 \
201 root straw 0
202 # id weight type name reweight
203 -87 320 root root
204 -85 160 row row0
205 -81 80 rack rack0
206 -1 4 node node0
207 0 1 osd.0 1
208 1 1 osd.1 1
209 2 1 osd.2 1
210 3 1 osd.3 1
211 -2 4 node node1
212 4 1 osd.4 1
213 5 1 osd.5 1
214 ...
215
216 CRUSH rules are created so the generated crushmap can be tested. They
217 are the same rules as the ones created by default when creating a new
218 Ceph cluster. They can be further edited with:
219
220 # decompile
221 crushtool -d crushmap -o map.txt
222
223 # edit
224 emacs map.txt
225
226 # recompile
227 crushtool -c map.txt -o crushmap
228
230 The reclassify function allows users to transition from older maps that
231 maintain parallel hierarchies for OSDs of different types to a modern
232 CRUSH map that makes use of the device class feature. For more infor‐
233 mation, see
234 http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
235
237 See
238 https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
239 for sample crushtool --test commands and output produced thereby.
240
242 crushtool is part of Ceph, a massively scalable, open-source, distrib‐
243 uted storage system. Please refer to the Ceph documentation at
244 http://ceph.com/docs for more information.
245
247 ceph(8), osdmaptool(8),
248
250 John Wilkins, Sage Weil, Loic Dachary
251
253 2010-2021, Inktank Storage, Inc. and contributors. Licensed under Cre‐
254 ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
255
256
257
258
259dev Mar 18, 2021 CRUSHTOOL(8)