1OSMIUM-EXTRACT(1) OSMIUM-EXTRACT(1)
2
3
4
6 osmium-extract - create geographical extracts from an OSM file
7
9 osmium extract –config CONFIG-FILE [OPTIONS] OSM-FILE
10 osmium extract –bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11 osmium extract –polygon POLYGON-FILE [OPTIONS] OSM-FILE
12
14 Create geographical extracts from an OSM data file or an OSM history
15 file. The region (geographical extent) can be given as a bounding box
16 or as a (multi)polygon.
17
18 There are three ways of calling this command:
19
20 · Specify a config file with the –config/-c option. It can define any
21 number of regions you want to cut out. See the CONFIG FILE section
22 for details.
23
24 · Specify a bounding box to cut out with the –bbox/-b option.
25
26 · Specify a (multi)polygon to cut out with the –polygon/-p option.
27
28 The input file is assumed to be ordered in the usual order: nodes
29 first, then ways, then relations.
30
31 If the --with-history option is used, the command will work correctly
32 for history files. This currently works for the complete_ways strategy
33 only. The simple or smart strategies do not work with history files.
34 A history extract will contain every version of all objects with at
35 least one version in the region. Generating a history extract is some‐
36 what slower than a normal data extract.
37
38 Osmium will make sure that all nodes on the vertices of the boundary of
39 the region will be in the extract, but nodes that happen to be directly
40 on the boundary, but between those vertices, might end up in the
41 extract or not. In almost all cases this will be good enough, but if
42 you want to make really sure you got everything, use a small buffer
43 around your region.
44
45 By default no bounds will be set in the header of the output file. Use
46 the –set-bounds option if you need this.
47
48 Note that osmium extract will never clip any OSM objects, ie. it will
49 not remove node references outside the region from ways or unused rela‐
50 tion members from relations. This means you might get objects that are
51 not reference-complete. It has the advantage that you can use osmium
52 merge to merge several extracts without problems.
53
55 -b, –bbox=LONG1,LAT1,LONG2,LAT2
56 Set the bounding box to cut out. Can not be used with –poly‐
57 gon/-p, –config/-c, or –directory/-d. The coordinates
58 LONG1,LAT1 are from one arbitrary corner, the coordinates
59 LONG2,LAT2 are from the opposite corner.
60
61 -c, –config=FILE
62 Set the name of the config file. Can not be used with the
63 –bbox/-b or –polygon/-p option. If this is set, the –output/-o
64 and –output-format/-f options are ignored, because they are set
65 in the config file.
66
67 -d, –directory=DIRECTORY
68 Output directory. Output file names in the config file are rel‐
69 ative to this directory. Overwrites the setting of the same
70 name in the config file. This option is ignored when the
71 –bbox/-b or –polygon/-p options are used, set the output direc‐
72 tory and name with the –output/-o option in that case.
73
74 -H, –with-history
75 Specify that the input file is a history file. The output
76 file(s) will also be history file(s).
77
78 -p, –polygon=POLYGON_FILE
79 Set the polygon to cut out based on the contents of the file.
80 The file has to be a GeoJSON, poly, or OSM file as described in
81 the (MULTI)POLYGON FILE FORMATS section. It has to have the
82 right suffix to be detected correctly. Can not be used with
83 –bbox/-b, –config/-c, or –directory/-d.
84
85 -s, –strategy=STRATEGY
86 Use the given strategy to extract the region. For possible val‐
87 ues and details see the STRATEGIES section. Default is “com‐
88 plete_ways”.
89
90 -S, –option=OPTION=VALUE
91 Set a named option for the strategy. If needed you can specify
92 this option multiple times to set several options.
93
94 –set-bounds
95 Set the bounds field in the header. The bounds are set to the
96 bbox or envelope of the polygon specified for the extract. Note
97 that strategies other than “simple” can put nodes outside those
98 bounds into the output file.
99
101 -h, –help
102 Show usage help.
103
104 -v, –verbose
105 Set verbose mode. The program will output information about
106 what it is doing to STDERR.
107
109 -F, –input-format=FORMAT
110 The format of the input file(s). Can be used to set the input
111 format if it can't be autodetected from the file name(s). This
112 will set the format for all input files, there is no way to set
113 the format for some input files only. See osmium-file-for‐
114 mats(5) or the libosmium manual for details.
115
117 -f, –output-format=FORMAT
118 The format of the output file. Can be used to set the output
119 file format if it can't be autodetected from the output file
120 name. See osmium-file-formats(5) or the libosmium manual for
121 details.
122
123 –fsync Call fsync after writing the output file to force flushing buf‐
124 fers to disk.
125
126 –generator=NAME
127 The name and version of the program generating the output file.
128 It will be added to the header of the output file. Default is
129 “osmium/” and the version of osmium.
130
131 -o, –output=FILE
132 Name of the output file. Default is `-' (STDOUT).
133
134 -O, –overwrite
135 Allow an existing output file to be overwritten. Normally
136 osmium will refuse to write over an existing file.
137
138 –output-header=OPTION=VALUE
139 Add output header option. This command line option can be used
140 multiple times for different OPTIONs. See the libosmium manual
141 for a list of available header options.
142
144 The config file mainly specifies the file names and the regions of the
145 extracts that should be created.
146
147 The config file is in JSON format. The top-level is an object which
148 contains at least an “extracts” array. It can also contain a “direc‐
149 tory” entry which names the directory where all the output files will
150 be created:
151
152 {
153 "extracts": [...],
154 "directory": "/tmp/"
155 }
156
157 The extracts array specifies the extracts that should be created. Each
158 item in the array is an object with at least a name “output” naming the
159 output file and a region defined in a “bbox”, “polygon” or “multipoly‐
160 gon” name. An optional “description” can be added, it will not be used
161 by the program but can help with documenting the file contents. You
162 can add an optional “output_format” if the format can not be detected
163 from the “output” file name. Run “osmium help file-formats” to get a
164 description of allowed formats. The optional “output_header” allows
165 you to set additional OSM file header settings such as the “generator”.
166
167 "extracts": [
168 {
169 "output": "hamburg.osm.pbf",
170 "output_format": "pbf",
171 "description": "optional description",
172 "bbox": ...
173 },
174 {
175 "output": "berlin.osm.pbf",
176 "description": "optional description",
177 "polygon": ...
178 },
179 {
180 "output": "munich.osm.pbf",
181 "output_header": {
182 "generator": "MyExtractor/1.0"
183 },
184 "description": "optional description",
185 "multipolygon": ...
186 }
187 ]
188
189 There are several formats for specifying the regions:
190
191 bbox:
192
193 A bounding box in one of two formats. The first is a simple array with
194 four real numbers, the first two specifying the coordinates of an arbi‐
195 trary corner, the second two specifying the coordinates of the opposite
196 corner.
197
198 {
199 "output": "munich.osm.pbf",
200 "description": "Bounding box specified in array format",
201 "bbox": [11.35, 48.05, 11.73, 48.25]
202 }
203
204 The second format uses an object instead of an array:
205
206 {
207 "output": "dresden.osm.pbf",
208 "description": "Bounding box specified in object format",
209 "bbox": {
210 "left": 13.57,
211 "right": 13.97,
212 "top": 51.18,
213 "bottom": 50.97
214 }
215 }
216
217 polygon:
218
219 A polygon, either specified inline in the config file or read from an
220 external file. See the (MULTI)POLYGON FILE FORMATS section for exter‐
221 nal files. If specified inline this is a nested array, the outer array
222 defining the polygon, the next array the rings and the innermost arrays
223 the coordinates. This format is the same as in GeoJSON files.
224
225 In this example there is only one outer ring:
226
227 "polygon": [[
228 [9.613465, 53.58071],
229 [9.647599, 53.59655],
230 [9.649288, 53.61059],
231 [9.613465, 53.58071]
232 ]]
233
234 In each ring, the last set of coordinates should be the same as the
235 first set, closing the ring.
236
237 multipolygon:
238
239 A multipolygon, either specified inline in the config file or read from
240 an external file. See the (MULTI)POLYGON FILE FORMATS section for
241 external files. If specified inline this is a nested array, the outer
242 array defining the multipolygon, the next array the polygons, the next
243 the rings and the innermost arrays the coordinates. This format is the
244 same as in GeoJSON files.
245
246 In this example there is one outer and one inner ring:
247
248 "multipolygon": [[[
249 [6.847, 50.987],
250 [6.910, 51.007],
251 [7.037, 50.953],
252 [6.967, 50.880],
253 [6.842, 50.925],
254 [6.847, 50.987]
255 ],[
256 [6.967, 50.954],
257 [6.969, 50.920],
258 [6.932, 50.928],
259 [6.934, 50.950],
260 [6.967, 50.954]
261 ]]]
262
263 In each ring, the last set of coordinates should be the same as the
264 first set, closing the ring.
265
266 Osmium must check each and every node in the input data and find out in
267 which bounding boxes or (multi)polygons this node is. This is very
268 cheap for bounding boxes, but more expensive for (multi)polygons. And
269 it becomes more expensive the more vertices the (multi)polyon has. Use
270 bounding boxes or simplified polygons where possible.
271
272 Note that bounding boxes or (multi)polygons are not allowed to span the
273 -180/180 degree line. If you need this, cut out the regions on each
274 side and use osmium merge to join the resulting files.
275
277 External files describing a (multi)polygon are specified in the config
278 file using the “file_name” and “file_type” properties on the “polygon”
279 or “multipolygon” object:
280
281 "polygon": {
282 "file_name": "berlin.geojson",
283 "file_type": "geojson"
284 }
285
286 If file names don't start with a slash (/), they are interpreted rela‐
287 tive to the directory where the config file is. If the “file_type” is
288 missing, Osmium will try to autodetect it from the suffix of the
289 “file_name”.
290
291 The following file types are supported:
292
293 geojson
294 GeoJSON file containing exactly one Feature of type Polygon or
295 MultiPolygon, or a FeatureCollection with the first Feature of
296 type Polygon or MultiPolygon. Everything except the actual
297 geometry (of the first Feature) is ignored.
298
299 poly A poly file as described in https://wiki.open‐
300 streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format . This
301 wiki page also mentions several sources for such poly files.
302
303 osm An OSM file containing one or more multipolygon or boundary
304 relation together with all the nodes and ways needed. Any OSM
305 file format (XML, PBF, ...) supported by Osmium can be used
306 here, but the correct suffix must be used, so the file format is
307 detected correctly. Files for this can easily be obtained by
308 searching for the area on OSM and then downloading the full
309 relation using a URL like https://www.open‐
310 streetmap.org/api/0.6/relation/RELATION-ID/full . Or you can
311 use osmium getid -r to get a specific relation from an OSM file.
312 Note that both these approaches can get you very detailed bound‐
313 aries which can take quite a while to cut out. Consider simpli‐
314 fying the boundary before use.
315
316 If there are several (multi)polygons in a poly file or OSM file, they
317 will be merged. The (multi)polygons must not overlap, otherwise the
318 result is undefined.
319
321 osmium extract can use different strategies for creating the extracts.
322 Depending on the strategy different objects will end up in the
323 extracts. The strategies differ in how much memory they need and how
324 often they need to read the input file. The choice of strategy depends
325 on how you want to use the generated extracts and how much memory and
326 time you have.
327
328 The default strategy is complete_ways.
329
330 Strategy simple
331 Runs in a single pass. The extract will contain all nodes
332 inside the region and all ways referencing those nodes as well
333 as all relations referencing any nodes or ways already included.
334 Ways crossing the region boundary will not be reference-com‐
335 plete. Relations will not be reference-complete. This strategy
336 is fast, because it reads the input only once, but the result is
337 not enough for most use cases. It is the only strategy that
338 will work when reading from a socket or pipe. This strategy
339 will not work for history files.
340
341 Strategy complete_ways
342 Runs in two passes. The extract will contain all nodes inside
343 the region and all ways referencing those nodes as well as all
344 nodes referenced by those ways. The extract will also contain
345 all relations referenced by nodes inside the region or ways
346 already included and, recursively, their parent relations. The
347 ways are reference-complete, but the relations are not.
348
349 Strategy smart
350 Runs in three passes. The extract will contain all nodes inside
351 the region and all ways referencing those nodes as well as all
352 nodes referenced by those ways. The extract will also contain
353 all relations referenced by nodes inside the region or ways
354 already included and, recursively, their parent relations. The
355 extract will also contain all nodes and ways (and the nodes they
356 reference) referenced by relations tagged “type=multipolygon”
357 directly referencing any nodes in the region or ways referencing
358 nodes in the region. The ways are reference-complete, and all
359 multipolygon relations referencing nodes in the regions or ways
360 that have nodes in the region are reference-complete. Other
361 relations are not reference-complete.
362
363 For the smart strategy you can change the types of relations that
364 should be reference-complete. Instead of just relations tagged
365 “type=multipolygon”, you can either get all relations (use “-S
366 types=any”) or give a list of types to the -S option: “-S types=multi‐
367 polygon,route”. Note that especially boundary relations can be huge,
368 so if you include them, be aware your result might be huge.
369
370 The smart strategy allows another option “-S complete-partial-rela‐
371 tions=X”. If this is set, all relations that have more than X percent
372 of their members already in the extract will have their full set of
373 members in the extract. So this allows completing almost complete
374 relations. It can be useful for instance to make sure a boundary rela‐
375 tion is complete even if some of it is outside the polygon used for
376 extraction.
377
379 osmium extract exits with exit code
380
381 0 if everything went alright,
382
383 1 if there was an error processing the data, or
384
385 2 if there was a problem with the command line arguments, config
386 file or polygon files.
387
389 Memory usage of osmium extract depends on the number of extracts and on
390 the strategy used. For the simple strategy it will at least be the
391 number of extracts times the highest node ID used divided by 8. For
392 the complete_ways twice that and for the smart strategy a bit more.
393
394 If you want to split a large file into many extracts, do this in sev‐
395 eral steps. First create several larger extracts and then split them
396 again and again into smaller pieces.
397
399 See the example config files in the extract-example-config directory.
400 To try it:
401
402 osmium extract -v -c extract-example-config/extracts.json \
403 germany-latest.osm.pbf
404
405 Extract the city of Karlsruhe using a boundary polygon:
406
407 osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
408 -o karlsruhe.osm.pbf
409
410 Extract the city of Munich using a bounding box:
411
412 osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
413 -o munich.osm.pbf
414
416 · osmium(1), osmium-file-formats(5), osmium-getid(1), osmium-merge(1)
417
418 · Osmium website (https://osmcode.org/osmium-tool/)
419
421 Copyright (C) 2013-2018 Jochen Topf <jochen@topf.org>.
422
423 License GPLv3+: GNU GPL version 3 or later
424 <https://gnu.org/licenses/gpl.html>. This is free software: you are
425 free to change and redistribute it. There is NO WARRANTY, to the
426 extent permitted by law.
427
429 If you have any questions or want to report a bug, please go to
430 https://osmcode.org/contact.html
431
433 Jochen Topf <jochen@topf.org>.
434
435
436
437 1.10.0 OSMIUM-EXTRACT(1)