1CEPH-BLUESTORE-TOOL(8) Ceph CEPH-BLUESTORE-TOOL(8)
2
3
4
6 ceph-bluestore-tool - bluestore administrative tool
7
9 ceph-bluestore-tool command
10 [ --dev device ... ]
11 [ -i osd_id ]
12 [ --path osd path ]
13 [ --out-dir dir ]
14 [ --log-file | -l filename ]
15 [ --deep ]
16 ceph-bluestore-tool fsck|repair --path osd path [ --deep ]
17 ceph-bluestore-tool qfsck --path osd path
18 ceph-bluestore-tool allocmap --path osd path
19 ceph-bluestore-tool restore_cfb --path osd path
20 ceph-bluestore-tool show-label --dev device ...
21 ceph-bluestore-tool prime-osd-dir --dev device --path osd path
22 ceph-bluestore-tool bluefs-export --path osd path --out-dir dir
23 ceph-bluestore-tool bluefs-bdev-new-wal --path osd path --dev-target new-device
24 ceph-bluestore-tool bluefs-bdev-new-db --path osd path --dev-target new-device
25 ceph-bluestore-tool bluefs-bdev-migrate --path osd path --dev-target new-device --devs-source device1 [--devs-source device2]
26 ceph-bluestore-tool free-dump|free-score --path osd path [ --allocator block/bluefs-wal/bluefs-db/bluefs-slow ]
27 ceph-bluestore-tool reshard --path osd path --sharding new sharding [ --sharding-ctrl control string ]
28 ceph-bluestore-tool show-sharding --path osd path
29
30
32 ceph-bluestore-tool is a utility to perform low-level administrative
33 operations on a BlueStore instance.
34
36 help
37 show help
38
39 fsck [ --deep ]
40 run consistency check on BlueStore metadata. If --deep is speci‐
41 fied, also read all object data and verify checksums.
42
43 repair
44 Run a consistency check and repair any errors we can.
45
46 qfsck
47 run consistency check on BlueStore metadata comparing allocator data
48 (from RocksDB CFB when exists and if not uses allocation-file) with
49 ONodes state.
50
51 allocmap
52 performs the same check done by qfsck and then stores a new alloca‐
53 tion-file (command is disabled by default and requires a special
54 build)
55
56 restore_cfb
57 Reverses changes done by the new NCB code (either through ceph
58 restart or when running allocmap command) and restores RocksDB B
59 Column-Family (allocator-map).
60
61 bluefs-export
62 Export the contents of BlueFS (i.e., RocksDB files) to an output di‐
63 rectory.
64
65 bluefs-bdev-sizes --path osd path
66 Print the device sizes, as understood by BlueFS, to stdout.
67
68 bluefs-bdev-expand --path osd path
69 Instruct BlueFS to check the size of its block devices and, if they
70 have expanded, make use of the additional space. Please note that
71 only the new files created by BlueFS will be allocated on the pre‐
72 ferred block device if it has enough free space, and the existing
73 files that have spilled over to the slow device will be gradually
74 removed when RocksDB performs compaction. In other words, if there
75 is any data spilled over to the slow device, it will be moved to the
76 fast device over time.
77
78 bluefs-bdev-new-wal --path osd path --dev-target new-device
79 Adds WAL device to BlueFS, fails if WAL device already exists.
80
81 bluefs-bdev-new-db --path osd path --dev-target new-device
82 Adds DB device to BlueFS, fails if DB device already exists.
83
84 bluefs-bdev-migrate --dev-target new-device --devs-source device1
85 [--devs-source device2]
86 Moves BlueFS data from source device(s) to the target one, source
87 devices (except the main one) are removed on success. Target device
88 can be both already attached or new device. In the latter case it's
89 added to OSD replacing one of the source devices. Following replace‐
90 ment rules apply (in the order of precedence, stop on the first
91 match):
92
93 • if source list has DB volume - target device replaces it.
94
95 • if source list has WAL volume - target device replace it.
96
97 • if source list has slow volume only - operation isn't permit‐
98 ted, requires explicit allocation via new-db/new-wal command.
99
100 show-label --dev device [...]
101 Show device label(s).
102
103 free-dump --path osd path [ --allocator
104 block/bluefs-wal/bluefs-db/bluefs-slow ]
105 Dump all free regions in allocator.
106
107 free-score --path osd path [ --allocator
108 block/bluefs-wal/bluefs-db/bluefs-slow ]
109 Give a [0-1] number that represents quality of fragmentation in al‐
110 locator. 0 represents case when all free space is in one chunk. 1
111 represents worst possible fragmentation.
112
113 reshard --path osd path --sharding new sharding [ --resharding-ctrl
114 control string ]
115 Changes sharding of BlueStore's RocksDB. Sharding is build on top of
116 RocksDB column families. This option allows to test performance of
117 new sharding without need to redeploy OSD. Resharding is usually a
118 long process, which involves walking through entire RocksDB key
119 space and moving some of them to different column families. Option
120 --resharding-ctrl provides performance control over resharding
121 process. Interrupted resharding will prevent OSD from running. In‐
122 terrupted resharding does not corrupt data. It is always possible to
123 continue previous resharding, or select any other sharding scheme,
124 including reverting to original one.
125
126 show-sharding --path osd path
127 Show sharding that is currently applied to BlueStore's RocksDB.
128
130 --dev *device*
131 Add device to the list of devices to consider
132
133 -i *osd_id*
134 Operate as OSD osd_id. Connect to monitor for OSD specific op‐
135 tions. If monitor is unavailable, add --no-mon-config to read
136 from ceph.conf instead.
137
138 --devs-source *device*
139 Add device to the list of devices to consider as sources for mi‐
140 grate operation
141
142 --dev-target *device*
143 Specify target device migrate operation or device to add for
144 adding new DB/WAL.
145
146 --path *osd path*
147 Specify an osd path. In most cases, the device list is inferred
148 from the symlinks present in osd path. This is usually simpler
149 than explicitly specifying the device(s) with --dev. Not neces‐
150 sary if -i osd_id is provided.
151
152 --out-dir *dir*
153 Output directory for bluefs-export
154
155 -l, --log-file *log file*
156 file to log to
157
158 --log-level *num*
159 debug log level. Default is 30 (extremely verbose), 20 is very
160 verbose, 10 is verbose, and 1 is not very verbose.
161
162 --deep deep scrub/repair (read and validate object data, not just meta‐
163 data)
164
165 --allocator *name*
166 Useful for free-dump and free-score actions. Selects alloca‐
167 tor(s).
168
169 --resharding-ctrl *control string*
170 Provides control over resharding process. Specifies how often
171 refresh RocksDB iterator, and how large should commit batch be
172 before committing to RocksDB. Option format is: <iterator_re‐
173 fresh_bytes>/<iterator_refresh_keys>/<batch_com‐
174 mit_bytes>/<batch_commit_keys> Default:
175 10000000/10000/1000000/1000
176
178 Any configuration option that is accepted by OSD can be also passed to
179 ceph-bluestore-tool. Useful to provide necessary configuration options
180 when access to monitor/ceph.conf is impossible and -i option cannot be
181 used.
182
184 Every BlueStore block device has a single block label at the beginning
185 of the device. You can dump the contents of the label with:
186
187 ceph-bluestore-tool show-label --dev *device*
188
189 The main device will have a lot of metadata, including information that
190 used to be stored in small files in the OSD data directory. The auxil‐
191 iary devices (db and wal) will only have the minimum required fields
192 (OSD UUID, size, device type, birth time).
193
195 You can generate the content for an OSD data directory that can start
196 up a BlueStore OSD with the prime-osd-dir command:
197
198 ceph-bluestore-tool prime-osd-dir --dev *main device* --path /var/lib/ceph/osd/ceph-*id*
199
201 Some versions of BlueStore were susceptible to BlueFS log growing ex‐
202 tremely large - beyond the point of making booting OSD impossible. This
203 state is indicated by booting that takes very long and fails in _replay
204 function.
205
206 This can be fixed by::
207 ceph-bluestore-tool fsck --path osd path --bluefs_replay_recov‐
208 ery=true
209
210 It is advised to first check if rescue process would be successful::
211 ceph-bluestore-tool fsck --path osd path --bluefs_replay_recov‐
212 ery=true --bluefs_replay_recovery_disable_compact=true
213
214 If above fsck is successful fix procedure can be applied.
215
217 ceph-bluestore-tool is part of Ceph, a massively scalable, open-source,
218 distributed storage system. Please refer to the Ceph documentation at
219 https://docs.ceph.com for more information.
220
222 ceph-osd(8)
223
225 2010-2023, Inktank Storage, Inc. and contributors. Licensed under Cre‐
226 ative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0)
227
228
229
230
231dev Nov 15, 2023 CEPH-BLUESTORE-TOOL(8)