1OSDMAPTOOL(8) Ceph OSDMAPTOOL(8)
2
3
4
6 osdmaptool - ceph osd cluster map manipulation tool
7
9 osdmaptool mapfilename [--print] [--createsimple numosd
10 [--pgbits bitsperosd ] ] [--clobber]
11 osdmaptool mapfilename [--import-crush crushmap]
12 osdmaptool mapfilename [--export-crush crushmap]
13 osdmaptool mapfilename [--upmap file] [--upmap-max max-optimizations]
14 [--upmap-deviation max-deviation] [--upmap-pool poolname]
15 [--save] [--upmap-active]
16 osdmaptool mapfilename [--upmap-cleanup] [--upmap file]
17
18
20 osdmaptool is a utility that lets you create, view, and manipulate OSD
21 cluster maps from the Ceph distributed storage system. Notably, it lets
22 you extract the embedded CRUSH map or import a new CRUSH map. It can
23 also simulate the upmap balancer mode so you can get a sense of what is
24 needed to balance your PGs.
25
27 --print
28 will simply make the tool print a plaintext dump of the map, af‐
29 ter any modifications are made.
30
31 --dump <format>
32 displays the map in plain text when <format> is 'plain', 'json'
33 if specified format is not supported. This is an alternative to
34 the print option.
35
36 --clobber
37 will allow osdmaptool to overwrite mapfilename if changes are
38 made.
39
40 --import-crush mapfile
41 will load the CRUSH map from mapfile and embed it in the OSD
42 map.
43
44 --export-crush mapfile
45 will extract the CRUSH map from the OSD map and write it to map‐
46 file.
47
48 --createsimple numosd [--pg-bits bitsperosd] [--pgp-bits bits]
49 will create a relatively generic OSD map with the numosd de‐
50 vices. If --pg-bits is specified, the initial placement group
51 counts will be set with bitsperosd bits per OSD. That is, the
52 pg_num map attribute will be set to numosd shifted by bitsper‐
53 osd. If --pgp-bits is specified, then the pgp_num map attribute
54 will be set to numosd shifted by bits.
55
56 --create-from-conf
57 creates an osd map with default configurations.
58
59 --test-map-pgs [--pool poolid] [--range-first <first> --range-last
60 <last>]
61 will print out the mappings from placement groups to OSDs. If
62 range is specified, then it iterates from first to last in the
63 directory specified by argument to osdmaptool. Eg: osdmaptool
64 --test-map-pgs --range-first 0 --range-last 2 osdmap_dir. This
65 will iterate through the files named 0,1,2 in osdmap_dir.
66
67 --test-map-pgs-dump [--pool poolid] [--range-first <first> --range-last
68 <last>]
69 will print out the summary of all placement groups and the map‐
70 pings from them to the mapped OSDs. If range is specified, then
71 it iterates from first to last in the directory specified by ar‐
72 gument to osdmaptool. Eg: osdmaptool --test-map-pgs-dump
73 --range-first 0 --range-last 2 osdmap_dir. This will iterate
74 through the files named 0,1,2 in osdmap_dir.
75
76 --test-map-pgs-dump-all [--pool poolid] [--range-first <first>
77 --range-last <last>]
78 will print out the summary of all placement groups and the map‐
79 pings from them to all the OSDs. If range is specified, then it
80 iterates from first to last in the directory specified by argu‐
81 ment to osdmaptool. Eg: osdmaptool --test-map-pgs-dump-all
82 --range-first 0 --range-last 2 osdmap_dir. This will iterate
83 through the files named 0,1,2 in osdmap_dir.
84
85 --test-random
86 does a random mapping of placement groups to the OSDs.
87
88 --test-map-pg <pgid>
89 map a particular placement group(specified by pgid) to the OSDs.
90
91 --test-map-object <objectname> [--pool <poolid>]
92 map a particular placement group(specified by objectname) to the
93 OSDs.
94
95 --test-crush [--range-first <first> --range-last <last>]
96 map placement groups to acting OSDs. If range is specified,
97 then it iterates from first to last in the directory specified
98 by argument to osdmaptool. Eg: osdmaptool --test-crush
99 --range-first 0 --range-last 2 osdmap_dir. This will iterate
100 through the files named 0,1,2 in osdmap_dir.
101
102 --mark-up-in
103 mark osds up and in (but do not persist).
104
105 --mark-out
106 mark an osd as out (but do not persist)
107
108 --mark-up <osdid>
109 mark an osd as up (but do not persist)
110
111 --mark-in <osdid>
112 mark an osd as in (but do not persist)
113
114 --tree Displays a hierarchical tree of the map.
115
116 --clear-temp
117 clears pg_temp and primary_temp variables.
118
119 --clean-temps
120 clean pg_temps.
121
122 --health
123 dump health checks
124
125 --with-default-pool
126 include default pool when creating map
127
128 --upmap-cleanup <file>
129 clean up pg_upmap[_items] entries, writing commands to <file>
130 [default: - for stdout]
131
132 --upmap <file>
133 calculate pg upmap entries to balance pg layout writing commands
134 to <file> [default: - for stdout]
135
136 --upmap-max <max-optimizations>
137 set max upmap entries to calculate [default: 10]
138
139 --upmap-deviation <max-deviation>
140 max deviation from target [default: 5]
141
142 --upmap-pool <poolname>
143 restrict upmap balancing to 1 pool or the option can be repeated
144 for multiple pools
145
146 --upmap-active
147 Act like an active balancer, keep applying changes until bal‐
148 anced
149
150 --adjust-crush-weight <osdid:weight>[,<osdid:weight>,<...>]
151 Change CRUSH weight of <osdid>
152
153 --save write modified osdmap with upmap or crush-adjust changes
154
155 --read <file>
156 calculate pg upmap entries to balance pg primaries
157
158 --read-pool <poolname>
159 specify which pool the read balancer should adjust
160
161 --vstart
162 prefix upmap and read output with './bin/'
163
165 To create a simple map with 16 devices:
166
167 osdmaptool --createsimple 16 osdmap --clobber
168
169 To view the result:
170
171 osdmaptool --print osdmap
172
173 To view the mappings of placement groups for pool 1:
174
175 osdmaptool osdmap --test-map-pgs-dump --pool 1
176
177 pool 1 pg_num 8
178 1.0 [0,2,1] 0
179 1.1 [2,0,1] 2
180 1.2 [0,1,2] 0
181 1.3 [2,0,1] 2
182 1.4 [0,2,1] 0
183 1.5 [0,2,1] 0
184 1.6 [0,1,2] 0
185 1.7 [1,0,2] 1
186 #osd count first primary c wt wt
187 osd.0 8 5 5 1 1
188 osd.1 8 1 1 1 1
189 osd.2 8 2 2 1 1
190 in 3
191 avg 8 stddev 0 (0x) (expected 2.3094 0.288675x))
192 min osd.0 8
193 max osd.0 8
194 size 0 0
195 size 1 0
196 size 2 0
197 size 3 8
198
199 In which,
200
201 1. pool 1 has 8 placement groups. And two tables follow:
202
203 2. A table for placement groups. Each row presents a placement
204 group. With columns of:
205
206 • placement group id,
207
208 • acting set, and
209
210 • primary OSD.
211
212 3. A table for all OSDs. Each row presents an OSD. With columns
213 of:
214
215 • count of placement groups being mapped to this OSD,
216
217 • count of placement groups where this OSD is the first one
218 in their acting sets,
219
220 • count of placement groups where this OSD is the primary of
221 them,
222
223 • the CRUSH weight of this OSD, and
224
225 • the weight of this OSD.
226
227 4. Looking at the number of placement groups held by 3 OSDs. We
228 have
229
230 • average, stddev, stddev/average, expected stddev, expected
231 stddev / average
232
233 • min and max
234
235 5. The number of placement groups mapping to n OSDs. In this
236 case, all 8 placement groups are mapping to 3 different OSDs.
237
238 In a less-balanced cluster, we could have following output for the sta‐
239 tistics of placement group distribution, whose standard deviation is
240 1.41421:
241
242 #osd count first primary c wt wt
243 osd.0 8 5 5 1 1
244 osd.1 8 1 1 1 1
245 osd.2 8 2 2 1 1
246
247 #osd count first primary c wt wt
248 osd.0 33 9 9 0.0145874 1
249 osd.1 34 14 14 0.0145874 1
250 osd.2 31 7 7 0.0145874 1
251 osd.3 31 13 13 0.0145874 1
252 osd.4 30 14 14 0.0145874 1
253 osd.5 33 7 7 0.0145874 1
254 in 6
255 avg 32 stddev 1.41421 (0.0441942x) (expected 5.16398 0.161374x))
256 min osd.4 30
257 max osd.1 34
258 size 00
259 size 10
260 size 20
261 size 364
262
263 To simulate the active balancer in upmap mode:
264
265 osdmaptool --upmap upmaps.out --upmap-active --upmap-deviation 6 --upmap-max 11 osdmap
266
267 osdmaptool: osdmap file 'osdmap'
268 writing upmap command output to: upmaps.out
269 checking for upmap cleanups
270 upmap, max-count 11, max deviation 6
271 pools movies photos metadata data
272 prepared 11/11 changes
273 Time elapsed 0.00310404 secs
274 pools movies photos metadata data
275 prepared 11/11 changes
276 Time elapsed 0.00283402 secs
277 pools data metadata movies photos
278 prepared 11/11 changes
279 Time elapsed 0.003122 secs
280 pools photos metadata data movies
281 prepared 11/11 changes
282 Time elapsed 0.00324372 secs
283 pools movies metadata data photos
284 prepared 1/11 changes
285 Time elapsed 0.00222609 secs
286 pools data movies photos metadata
287 prepared 0/11 changes
288 Time elapsed 0.00209916 secs
289 Unable to find further optimization, or distribution is already perfect
290 osd.0 pgs 41
291 osd.1 pgs 42
292 osd.2 pgs 42
293 osd.3 pgs 41
294 osd.4 pgs 46
295 osd.5 pgs 39
296 osd.6 pgs 39
297 osd.7 pgs 43
298 osd.8 pgs 41
299 osd.9 pgs 46
300 osd.10 pgs 46
301 osd.11 pgs 46
302 osd.12 pgs 46
303 osd.13 pgs 41
304 osd.14 pgs 40
305 osd.15 pgs 40
306 osd.16 pgs 39
307 osd.17 pgs 46
308 osd.18 pgs 46
309 osd.19 pgs 39
310 osd.20 pgs 42
311 Total time elapsed 0.0167765 secs, 5 rounds
312
313 To simulate the active balancer in read mode, first make sure capacity
314 is balanced by running the balancer in upmap mode. Then, balance the
315 reads on a replicated pool with:
316
317 osdmaptool osdmap --read read.out --read-pool <pool name>
318
319 ./bin/osdmaptool: osdmap file 'om'
320 writing upmap command output to: read.out
321
322 ---------- BEFORE ------------
323 osd.0 | primary affinity: 1 | number of prims: 3
324 osd.1 | primary affinity: 1 | number of prims: 10
325 osd.2 | primary affinity: 1 | number of prims: 3
326
327 read_balance_score of 'cephfs.a.meta': 1.88
328
329
330 ---------- AFTER ------------
331 osd.0 | primary affinity: 1 | number of prims: 5
332 osd.1 | primary affinity: 1 | number of prims: 5
333 osd.2 | primary affinity: 1 | number of prims: 6
334
335 read_balance_score of 'cephfs.a.meta': 1.13
336
337
338 num changes: 5
339
341 osdmaptool is part of Ceph, a massively scalable, open-source, distrib‐
342 uted storage system. Please refer to the Ceph documentation at
343 https://docs.ceph.com for more information.
344
346 ceph(8), crushtool(8),
347
349 2010-2023, Inktank Storage, Inc. and contributors. Licensed under Cre‐
350 ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
351
352
353
354
355dev Nov 15, 2023 OSDMAPTOOL(8)