crushtool(8)

1CRUSHTOOL(8)                         Ceph                         CRUSHTOOL(8)
2
3
4

NAME

6       crushtool - CRUSH map manipulation tool
7

SYNOPSIS

9       crushtool ( -d map | -c map.txt | --build --num_osds numosds
10       layer1 ... | --test ) [ -o outfile ]
11
12

DESCRIPTION

14       crushtool  is  a  utility  that lets you create, compile, decompile and
15       test CRUSH map files.
16
17       CRUSH is a pseudo-random data distribution algorithm  that  efficiently
18       maps  input values (which, in the context of Ceph, correspond to Place‐
19       ment Groups) across a heterogeneous, hierarchically  structured  device
20       map.  The algorithm was originally described in detail in the following
21       paper (although it has evolved some since then):
22
23          http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
24
25       The tool has four modes of operation.
26
27       --compile|-c map.txt
28              will compile a plaintext map.txt into a binary map file.
29
30       --decompile|-d map
31              will take the compiled map and decompile  it  into  a  plaintext
32              source file, suitable for editing.
33
34       --build --num_osds {num-osds} layer1 ...
35              will  create map with the given layer structure. See below for a
36              detailed explanation.
37
38       --test will perform a dry run of a CRUSH mapping for a range  of  input
39              values [--min-x,--max-x] (default [0,1023]) which can be thought
40              of as simulated Placement Groups. See below for a more  detailed
41              explanation.
42
43       Unlike other Ceph tools, crushtool does not accept generic options such
44       as --debug-crush from the command line. They can, however, be  provided
45       via  the  CEPH_ARGS  environment variable. For instance, to silence all
46       output from the CRUSH subsystem:
47
48          CEPH_ARGS="--debug-crush 0" crushtool ...
49

RUNNING TESTS WITH --TEST

51       The test mode will use the input crush map ( as specified with -i map )
52       and  perform a dry run of CRUSH mapping or random placement (if --simu‐
53       late is set ). On completion, two kinds of reports can be created.   1)
54       The --show-... option outputs human readable information on stderr.  2)
55       The --output-csv option creates CSV files that are  documented  by  the
56       --help-output option.
57
58       Note: Each Placement Group (PG) has an integer ID which can be obtained
59       from ceph pg dump (for example PG 2.2f means pool id 2, PG id 32).  The
60       pool  and  PG  IDs  are  combined by a function to get a value which is
61       given to CRUSH to map it to OSDs. crushtool does not know about PGs  or
62       pools;  it  only  runs  simulations  by  mapping  values  in  the range
63       [--min-x,--max-x].
64
65       --show-statistics
66              Displays a summary of the distribution. For instance:
67
68                 rule 1 (metadata) num_rep 5 result size == 5:    1024/1024
69
70              shows that rule 1 which is named  metadata  successfully  mapped
71              1024  values to result size == 5 devices when trying to map them
72              to num_rep 5 replicas. When it fails  to  provide  the  required
73              mapping,   presumably  because  the  number  of  tries  must  be
74              increased,  a  breakdown  of  the  failures  is  displayed.  For
75              instance:
76
77                 rule 1 (metadata) num_rep 10 result size == 8:   4/1024
78                 rule 1 (metadata) num_rep 10 result size == 9:   93/1024
79                 rule 1 (metadata) num_rep 10 result size == 10:  927/1024
80
81              shows  that although num_rep 10 replicas were required, 4 out of
82              1024 values ( 4/1024 ) were mapped to result size ==  8  devices
83              only.
84
85       --show-mappings
86              Displays    the   mapping   of   each   value   in   the   range
87              [--min-x,--max-x].  For instance:
88
89                 CRUSH rule 1 x 24 [11,6]
90
91              shows that value 24 is mapped to devices [11,6] by rule 1.
92
93       --show-bad-mappings
94              Displays which value failed to be mapped to the required  number
95              of devices. For instance:
96
97                 bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
98
99              shows  that  when rule 1 was required to map 7 devices, it could
100              map only six : [8,10,2,11,6,9].
101
102       --show-utilization
103              Displays the expected and actual utilization  for  each  device,
104              for each number of replicas. For instance:
105
106                 device 0: stored : 951      expected : 853.333
107                 device 1: stored : 963      expected : 853.333
108                 ...
109
110              shows  that device 0 stored 951 values and was expected to store
111              853.  Implies --show-statistics.
112
113       --show-utilization-all
114              Displays the same as --show-utilization but  does  not  suppress
115              output when the weight of a device is zero.  Implies --show-sta‐
116              tistics.
117
118       --show-choose-tries
119              Displays how many attempts were needed to find a device mapping.
120              For instance:
121
122                 0:     95224
123                 1:      3745
124                 2:      2225
125                 ..
126
127              shows  that  95224 mappings succeeded without retries, 3745 map‐
128              pings succeeded with one attempts, etc. There are as  many  rows
129              as the value of the --set-choose-total-tries option.
130
131       --output-csv
132              Creates CSV files (in the current directory) containing informa‐
133              tion documented by --help-output. The files are named after  the
134              rule  used  when collecting the statistics. For instance, if the
135              rule : 'metadata' is used, the CSV files will be:
136
137                 metadata-absolute_weights.csv
138                 metadata-device_utilization.csv
139                 ...
140
141              The first line of the file shortly explains the  column  layout.
142              For instance:
143
144                 metadata-absolute_weights.csv
145                 Device ID, Absolute Weight
146                 0,1
147                 ...
148
149       --output-name NAME
150              Prepend  NAME  to  the file names generated when --output-csv is
151              specified. For instance --output-name FOO will create files:
152
153                 FOO-metadata-absolute_weights.csv
154                 FOO-metadata-device_utilization.csv
155                 ...
156
157       The --set-... options can be used to modify the tunables of  the  input
158       crush map. The input crush map is modified in memory. For example:
159
160          $ crushtool -i mymap --test --show-bad-mappings
161          bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
162
163       could be fixed by increasing the choose-total-tries as follows:
164
165          $ crushtool -i mymap --test
166                 --show-bad-mappings --set-choose-total-tries 500
167

BUILDING A MAP WITH --BUILD

169       The  build  mode  will  generate  hierarchical maps. The first argument
170       specifies the number of devices (leaves) in the CRUSH  hierarchy.  Each
171       layer  describes  how  the  layer  (or  devices) preceding it should be
172       grouped.
173
174       Each layer consists of:
175
176          bucket ( uniform | list | tree | straw ) size
177
178       The bucket is the type of the buckets in the layer (e.g. "rack").  Each
179       bucket  name  will  be built by appending a unique number to the bucket
180       string (e.g. "rack0", "rack1"...).
181
182       The second component is the type of bucket: straw should be  used  most
183       of the time.
184
185       The  third  component is the maximum size of the bucket. A size of zero
186       means a bucket of infinite capacity.
187

EXAMPLE

189       Suppose we have two rows with two racks each and  20  nodes  per  rack.
190       Suppose each node contains 4 storage devices for Ceph OSD Daemons. This
191       configuration allows us to deploy 320 Ceph OSD Daemons. Lets  assume  a
192       42U rack with 2U nodes, leaving an extra 2U for a rack switch.
193
194       To  reflect  our  hierarchy of devices, nodes, racks and rows, we would
195       execute the following:
196
197          $ crushtool -o crushmap --build --num_osds 320 \
198                 node straw 4 \
199                 rack straw 20 \
200                 row straw 2 \
201                 root straw 0
202          # id        weight  type name       reweight
203          -87 320     root root
204          -85 160             row row0
205          -81 80                      rack rack0
206          -1  4                               node node0
207          0   1                                       osd.0   1
208          1   1                                       osd.1   1
209          2   1                                       osd.2   1
210          3   1                                       osd.3   1
211          -2  4                               node node1
212          4   1                                       osd.4   1
213          5   1                                       osd.5   1
214          ...
215
216       CRUSH rules are created so the generated crushmap can be  tested.  They
217       are  the  same rules as the ones created by default when creating a new
218       Ceph cluster. They can be further edited with:
219
220          # decompile
221          crushtool -d crushmap -o map.txt
222
223          # edit
224          emacs map.txt
225
226          # recompile
227          crushtool -c map.txt -o crushmap
228

RECLASSIFY

230       The reclassify function allows users to transition from older maps that
231       maintain  parallel  hierarchies for OSDs of different types to a modern
232       CRUSH map that makes use of the device class feature.  For more  infor‐
233       mation,                                                             see
234       http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#migrating-from-a-legacy-ssd-rule-to-device-classes.
235

EXAMPLE OUTPUT FROM --TEST

237       See
238       https://github.com/ceph/ceph/blob/master/src/test/cli/crushtool/set-choose.t
239       for sample crushtool --test commands and output produced thereby.
240

AVAILABILITY

242       crushtool  is part of Ceph, a massively scalable, open-source, distrib‐
243       uted  storage  system.  Please  refer  to  the  Ceph  documentation  at
244       http://ceph.com/docs for more information.
245

AUTHORS

250       John Wilkins, Sage Weil, Loic Dachary
251

COPYRIGHT

253       2010-2019,  Inktank Storage, Inc. and contributors. Licensed under Cre‐
254       ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
255
256
257
258
259dev                              Dec 10, 2019                     CRUSHTOOL(8)