1CRUSHTOOL(8)                         Ceph                         CRUSHTOOL(8)
2
3
4

NAME

6       crushtool - CRUSH map manipulation tool
7

SYNOPSIS

9       crushtool ( -d map | -c map.txt | --build --num_osds numosds
10       layer1 ... | --test ) [ -o outfile ]
11
12

DESCRIPTION

14       crushtool  is  a  utility  that lets you create, compile, decompile and
15       test CRUSH map files.
16
17       CRUSH is a pseudo-random data distribution algorithm  that  efficiently
18       maps  input values (which, in the context of Ceph, correspond to Place‐
19       ment Groups) across a heterogeneous, hierarchically  structured  device
20       map.  The algorithm was originally described in detail in the following
21       paper (although it has evolved some since then):
22
23          http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
24
25       The tool has four modes of operation.
26
27       --compile|-c map.txt
28              will compile a plaintext map.txt into a binary map file.
29
30       --decompile|-d map
31              will take the compiled map and decompile  it  into  a  plaintext
32              source file, suitable for editing.
33
34       --build --num_osds {num-osds} layer1 ...
35              will  create map with the given layer structure. See below for a
36              detailed explanation.
37
38       --test will perform a dry run of a CRUSH mapping for a range  of  input
39              values [--min-x,--max-x] (default [0,1023]) which can be thought
40              of as simulated Placement Groups. See below for a more  detailed
41              explanation.
42
43       Unlike other Ceph tools, crushtool does not accept generic options such
44       as --debug-crush from the command line. They can, however, be  provided
45       via  the  CEPH_ARGS  environment variable. For instance, to silence all
46       output from the CRUSH subsystem:
47
48          CEPH_ARGS="--debug-crush 0" crushtool ...
49

RUNNING TESTS WITH --TEST

51       The test mode will use the input crush map ( as specified with -i map )
52       and  perform a dry run of CRUSH mapping or random placement (if --simu‐
53       late is set ). On completion, two kinds of reports can be created.   1)
54       The --show-... option outputs human readable information on stderr.  2)
55       The --output-csv option creates CSV files that are  documented  by  the
56       --help-output option.
57
58       Note: Each Placement Group (PG) has an integer ID which can be obtained
59       from ceph pg dump (for example PG 2.2f means pool id 2, PG id 32).  The
60       pool  and  PG  IDs  are  combined by a function to get a value which is
61       given to CRUSH to map it to OSDs. crushtool does not know about PGs  or
62       pools;  it  only  runs  simulations  by  mapping  values  in  the range
63       [--min-x,--max-x].
64
65       --show-statistics
66              Displays a summary of the distribution. For instance:
67
68                 rule 1 (metadata) num_rep 5 result size == 5:    1024/1024
69
70              shows that rule 1 which is named  metadata  successfully  mapped
71              1024  values to result size == 5 devices when trying to map them
72              to num_rep 5 replicas. When it fails  to  provide  the  required
73              mapping,  presumably  because  the  number  of tries must be in‐
74              creased, a breakdown of the failures is displayed. For instance:
75
76                 rule 1 (metadata) num_rep 10 result size == 8:   4/1024
77                 rule 1 (metadata) num_rep 10 result size == 9:   93/1024
78                 rule 1 (metadata) num_rep 10 result size == 10:  927/1024
79
80              shows that although num_rep 10 replicas were required, 4 out  of
81              1024  values  ( 4/1024 ) were mapped to result size == 8 devices
82              only.
83
84       --show-mappings
85              Displays   the   mapping   of   each   value   in   the    range
86              [--min-x,--max-x].  For instance:
87
88                 CRUSH rule 1 x 24 [11,6]
89
90              shows that value 24 is mapped to devices [11,6] by rule 1.
91
92       --show-bad-mappings
93              Displays  which value failed to be mapped to the required number
94              of devices. For instance:
95
96                 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
97
98              shows that when rule 1 was required to map 7 devices,  it  could
99              map only six : [8,10,2,11,6,9].
100
101       --show-utilization
102              Displays  the  expected  and actual utilization for each device,
103              for each number of replicas. For instance:
104
105                 device 0: stored : 951      expected : 853.333
106                 device 1: stored : 963      expected : 853.333
107                 ...
108
109              shows that device 0 stored 951 values and was expected to  store
110              853.  Implies --show-statistics.
111
112       --show-utilization-all
113              Displays  the  same  as --show-utilization but does not suppress
114              output when the weight of a device is zero.  Implies --show-sta‐
115              tistics.
116
117       --show-choose-tries
118              Displays how many attempts were needed to find a device mapping.
119              For instance:
120
121                 0:     95224
122                 1:      3745
123                 2:      2225
124                 ..
125
126              shows that 95224 mappings succeeded without retries,  3745  map‐
127              pings  succeeded  with one attempts, etc. There are as many rows
128              as the value of the --set-choose-total-tries option.
129
130       --output-csv
131              Creates CSV files (in the current directory) containing informa‐
132              tion  documented by --help-output. The files are named after the
133              rule used when collecting the statistics. For instance,  if  the
134              rule : 'metadata' is used, the CSV files will be:
135
136                 metadata-absolute_weights.csv
137                 metadata-device_utilization.csv
138                 ...
139
140              The  first  line of the file shortly explains the column layout.
141              For instance:
142
143                 metadata-absolute_weights.csv
144                 Device ID, Absolute Weight
145                 0,1
146                 ...
147
148       --output-name NAME
149              Prepend NAME to the file names generated  when  --output-csv  is
150              specified. For instance --output-name FOO will create files:
151
152                 FOO-metadata-absolute_weights.csv
153                 FOO-metadata-device_utilization.csv
154                 ...
155
156       The  --set-...  options can be used to modify the tunables of the input
157       crush map. The input crush map is modified in memory. For example:
158
159          $ crushtool -i mymap --test --show-bad-mappings
160          bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
161
162       could be fixed by increasing the choose-total-tries as follows:
163
164          $ crushtool -i mymap --test
165                 --show-bad-mappings --set-choose-total-tries 500
166

BUILDING A MAP WITH --BUILD

168       The build mode will generate  hierarchical  maps.  The  first  argument
169       specifies  the  number of devices (leaves) in the CRUSH hierarchy. Each
170       layer describes how the layer  (or  devices)  preceding  it  should  be
171       grouped.
172
173       Each layer consists of:
174
175          bucket ( uniform | list | tree | straw | straw2 ) size
176
177       The  bucket is the type of the buckets in the layer (e.g. "rack"). Each
178       bucket name will be built by appending a unique number  to  the  bucket
179       string (e.g. "rack0", "rack1"...).
180
181       The  second  component is the type of bucket: straw should be used most
182       of the time.
183
184       The third component is the maximum size of the bucket. A size  of  zero
185       means a bucket of infinite capacity.
186

EXAMPLE

188       Suppose  we  have  two  rows with two racks each and 20 nodes per rack.
189       Suppose each node contains 4 storage devices for Ceph OSD Daemons. This
190       configuration  allows  us to deploy 320 Ceph OSD Daemons. Lets assume a
191       42U rack with 2U nodes, leaving an extra 2U for a rack switch.
192
193       To reflect our hierarchy of devices, nodes, racks and  rows,  we  would
194       execute the following:
195
196          $ crushtool -o crushmap --build --num_osds 320 \
197                 node straw 4 \
198                 rack straw 20 \
199                 row straw 2 \
200                 root straw 0
201          # id        weight  type name       reweight
202          -87 320     root root
203          -85 160             row row0
204          -81 80                      rack rack0
205          -1  4                               node node0
206          0   1                                       osd.0   1
207          1   1                                       osd.1   1
208          2   1                                       osd.2   1
209          3   1                                       osd.3   1
210          -2  4                               node node1
211          4   1                                       osd.4   1
212          5   1                                       osd.5   1
213          ...
214
215       CRUSH  rules  are created so the generated crushmap can be tested. They
216       are the same rules as the ones created by default when creating  a  new
217       Ceph cluster. They can be further edited with:
218
219          # decompile
220          crushtool -d crushmap -o map.txt
221
222          # edit
223          emacs map.txt
224
225          # recompile
226          crushtool -c map.txt -o crushmap
227

RECLASSIFY

229       The reclassify function allows users to transition from older maps that
230       maintain parallel hierarchies for OSDs of different types to  a  modern
231       CRUSH  map that makes use of the device class feature.  For more infor‐
232       mation,                                                             see
233       https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
234

EXAMPLE OUTPUT FROM --TEST

236       See
237       https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
238       for sample crushtool --test commands and output produced thereby.
239

AVAILABILITY

241       crushtool is part of Ceph, a massively scalable, open-source,  distrib‐
242       uted  storage  system.  Please  refer  to  the  Ceph  documentation  at
243       http://ceph.com/docs for more information.
244

SEE ALSO

246       ceph(8), osdmaptool(8),
247

AUTHORS

249       John Wilkins, Sage Weil, Loic Dachary
250
252       2010-2021, Inktank Storage, Inc. and contributors. Licensed under  Cre‐
253       ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
254
255
256
257
258dev                              May 13, 2021                     CRUSHTOOL(8)
Impressum