1CRUSHTOOL(8)                         Ceph                         CRUSHTOOL(8)
2
3
4

NAME

6       crushtool - CRUSH map manipulation tool
7

SYNOPSIS

9       crushtool ( -d map | -c map.txt | --build --num_osds numosds
10       layer1 ... | --test ) [ -o outfile ]
11
12

DESCRIPTION

14       crushtool  is  a  utility  that lets you create, compile, decompile and
15       test CRUSH map files.
16
17       CRUSH is a pseudo-random data distribution algorithm  that  efficiently
18       maps  input values (which, in the context of Ceph, correspond to Place‐
19       ment Groups) across a heterogeneous, hierarchically  structured  device
20       map.  The algorithm was originally described in detail in the following
21       paper (although it has evolved some since then):
22
23          http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
24
25       The tool has four modes of operation.
26
27       --compile|-c map.txt
28              will compile a plaintext map.txt into a binary map file.
29
30       --decompile|-d map
31              will take the compiled map and decompile  it  into  a  plaintext
32              source file, suitable for editing.
33
34       --build --num_osds {num-osds} layer1 ...
35              will  create map with the given layer structure. See below for a
36              detailed explanation.
37
38       --test will perform a dry run of a CRUSH mapping for a range  of  input
39              values [--min-x,--max-x] (default [0,1023]) which can be thought
40              of as simulated Placement Groups. See below for a more  detailed
41              explanation.
42
43       Unlike other Ceph tools, crushtool does not accept generic options such
44       as --debug-crush from the command line. They can, however, be  provided
45       via  the  CEPH_ARGS  environment variable. For instance, to silence all
46       output from the CRUSH subsystem:
47
48          CEPH_ARGS="--debug-crush 0" crushtool ...
49

RUNNING TESTS WITH --TEST

51       The test mode will use the input crush map ( as specified with -i map )
52       and  perform a dry run of CRUSH mapping or random placement (if --simu‐
53       late is set ). On completion, two kinds of reports can be created.   1)
54       The --show-... option outputs human readable information on stderr.  2)
55       The --output-csv option creates CSV files that are  documented  by  the
56       --help-output option.
57
58       Note: Each Placement Group (PG) has an integer ID which can be obtained
59       from ceph pg dump (for example PG 2.2f means pool id 2, PG id 32).  The
60       pool  and  PG  IDs  are  combined by a function to get a value which is
61       given to CRUSH to map it to OSDs. crushtool does not know about PGs  or
62       pools;  it  only  runs  simulations  by  mapping  values  in  the range
63       [--min-x,--max-x].
64
65       --show-statistics
66              Displays a summary of the distribution. For instance:
67
68                 rule 1 (metadata) num_rep 5 result size == 5:    1024/1024
69
70              shows that rule 1 which is named  metadata  successfully  mapped
71              1024  values to result size == 5 devices when trying to map them
72              to num_rep 5 replicas. When it fails  to  provide  the  required
73              mapping,  presumably  because  the  number  of tries must be in‐
74              creased, a breakdown of the failures is displayed. For instance:
75
76                 rule 1 (metadata) num_rep 10 result size == 8:   4/1024
77                 rule 1 (metadata) num_rep 10 result size == 9:   93/1024
78                 rule 1 (metadata) num_rep 10 result size == 10:  927/1024
79
80              shows that although num_rep 10 replicas were required, 4 out  of
81              1024  values  ( 4/1024 ) were mapped to result size == 8 devices
82              only.
83
84       --show-mappings
85              Displays   the   mapping   of   each   value   in   the    range
86              [--min-x,--max-x].  For instance:
87
88                 CRUSH rule 1 x 24 [11,6]
89
90              shows that value 24 is mapped to devices [11,6] by rule 1.
91
92              One  of the following is required when using the --show-mappings
93              option:
94
95                 a. --num-rep
96
97                 b. both --min-rep and --max-rep
98
99              --num-rep stands for "number of replicas, indicates  the  number
100              of replicas in a pool, and is used to specify an exact number of
101              replicas (for example --num-rep 5). --min-rep and --max-rep  are
102              used  together  to  specify  a  range  of replicas (for example,
103              --min-rep 1 --max-rep 10).
104
105       --show-bad-mappings
106              Displays which value failed to be mapped to the required  number
107              of devices. For instance:
108
109                 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
110
111              shows  that  when rule 1 was required to map 7 devices, it could
112              map only six : [8,10,2,11,6,9].
113
114       --show-utilization
115              Displays the expected and actual utilization  for  each  device,
116              for each number of replicas. For instance:
117
118                 device 0: stored : 951      expected : 853.333
119                 device 1: stored : 963      expected : 853.333
120                 ...
121
122              shows  that device 0 stored 951 values and was expected to store
123              853.  Implies --show-statistics.
124
125       --show-utilization-all
126              Displays the same as --show-utilization but  does  not  suppress
127              output when the weight of a device is zero.  Implies --show-sta‐
128              tistics.
129
130       --show-choose-tries
131              Displays how many attempts were needed to find a device mapping.
132              For instance:
133
134                 0:     95224
135                 1:      3745
136                 2:      2225
137                 ..
138
139              shows  that  95224 mappings succeeded without retries, 3745 map‐
140              pings succeeded with one attempts, etc. There are as  many  rows
141              as the value of the --set-choose-total-tries option.
142
143       --output-csv
144              Creates CSV files (in the current directory) containing informa‐
145              tion documented by --help-output. The files are named after  the
146              rule  used  when collecting the statistics. For instance, if the
147              rule : 'metadata' is used, the CSV files will be:
148
149                 metadata-absolute_weights.csv
150                 metadata-device_utilization.csv
151                 ...
152
153              The first line of the file shortly explains the  column  layout.
154              For instance:
155
156                 metadata-absolute_weights.csv
157                 Device ID, Absolute Weight
158                 0,1
159                 ...
160
161       --output-name NAME
162              Prepend  NAME  to  the file names generated when --output-csv is
163              specified. For instance --output-name FOO will create files:
164
165                 FOO-metadata-absolute_weights.csv
166                 FOO-metadata-device_utilization.csv
167                 ...
168
169       The --set-... options can be used to modify the tunables of  the  input
170       crush map. The input crush map is modified in memory. For example:
171
172          $ crushtool -i mymap --test --show-bad-mappings
173          bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
174
175       could be fixed by increasing the choose-total-tries as follows:
176
177          $ crushtool -i mymap --test
178                 --show-bad-mappings --set-choose-total-tries 500
179

BUILDING A MAP WITH --BUILD

181       The  build  mode  will  generate  hierarchical maps. The first argument
182       specifies the number of devices (leaves) in the CRUSH  hierarchy.  Each
183       layer  describes  how  the  layer  (or  devices) preceding it should be
184       grouped.
185
186       Each layer consists of:
187
188          bucket ( uniform | list | tree | straw | straw2 ) size
189
190       The bucket is the type of the buckets in the layer (e.g. "rack").  Each
191       bucket  name  will  be built by appending a unique number to the bucket
192       string (e.g. "rack0", "rack1"...).
193
194       The second component is the type of bucket: straw should be  used  most
195       of the time.
196
197       The  third  component is the maximum size of the bucket. A size of zero
198       means a bucket of infinite capacity.
199

EXAMPLE

201       Suppose we have two rows with two racks each and  20  nodes  per  rack.
202       Suppose each node contains 4 storage devices for Ceph OSD Daemons. This
203       configuration allows us to deploy 320 Ceph OSD Daemons. Lets  assume  a
204       42U rack with 2U nodes, leaving an extra 2U for a rack switch.
205
206       To  reflect  our  hierarchy of devices, nodes, racks and rows, we would
207       execute the following:
208
209          $ crushtool -o crushmap --build --num_osds 320 \
210                 node straw 4 \
211                 rack straw 20 \
212                 row straw 2 \
213                 root straw 0
214          # id        weight  type name       reweight
215          -87 320     root root
216          -85 160             row row0
217          -81 80                      rack rack0
218          -1  4                               node node0
219          0   1                                       osd.0   1
220          1   1                                       osd.1   1
221          2   1                                       osd.2   1
222          3   1                                       osd.3   1
223          -2  4                               node node1
224          4   1                                       osd.4   1
225          5   1                                       osd.5   1
226          ...
227
228       CRUSH rules are created so the generated crushmap can be  tested.  They
229       are  the  same rules as the ones created by default when creating a new
230       Ceph cluster. They can be further edited with:
231
232          # decompile
233          crushtool -d crushmap -o map.txt
234
235          # edit
236          emacs map.txt
237
238          # recompile
239          crushtool -c map.txt -o crushmap
240

RECLASSIFY

242       The reclassify function allows users to transition from older maps that
243       maintain  parallel  hierarchies for OSDs of different types to a modern
244       CRUSH map that makes use of the device class feature.  For more  infor‐
245       mation,                                                             see
246       https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
247

EXAMPLE OUTPUT FROM --TEST

249       See
250       https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
251       for sample crushtool --test commands and output produced thereby.
252

AVAILABILITY

254       crushtool  is part of Ceph, a massively scalable, open-source, distrib‐
255       uted  storage  system.  Please  refer  to  the  Ceph  documentation  at
256       https://docs.ceph.com for more information.
257

SEE ALSO

259       ceph(8), osdmaptool(8),
260

AUTHORS

262       John Wilkins, Sage Weil, Loic Dachary
263
265       2010-2023,  Inktank Storage, Inc. and contributors. Licensed under Cre‐
266       ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
267
268
269
270
271dev                              Nov 15, 2023                     CRUSHTOOL(8)
Impressum