1OSMIUM-EXTRACT(1) OSMIUM-EXTRACT(1)
2
3
4
6 osmium-extract - create geographical extracts from an OSM file
7
9 osmium extract --config CONFIG-FILE [OPTIONS] OSM-FILE
10 osmium extract --bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11 osmium extract --polygon POLYGON-FILE [OPTIONS] OSM-FILE
12
14 Create geographical extracts from an OSM data file or an OSM history
15 file. The region (geographical extent) can be given as a bounding box
16 or as a (multi)polygon.
17
18 There are three ways of calling this command:
19
20 • Specify a config file with the --config/-c option. It can define any
21 number of regions you want to cut out. See the CONFIG FILE section
22 for details.
23
24 • Specify a bounding box to cut out with the --bbox/-b option.
25
26 • Specify a (multi)polygon to cut out with the --polygon/-p option.
27
28 The input file is assumed to be ordered in the usual order: nodes
29 first, then ways, then relations.
30
31 If the --with-history/-H option is used, the command will work cor‐
32 rectly for history files. This currently works for the complete_ways
33 strategy only. The simple or smart strategies do not work with history
34 files. A history extract will contain every version of all objects
35 with at least one version in the region. Generating a history extract
36 is somewhat slower than a normal data extract.
37
38 Osmium will make sure that all nodes on the vertices of the boundary of
39 the region will be in the extract, but nodes that happen to be directly
40 on the boundary, but between those vertices, might end up in the ex‐
41 tract or not. In almost all cases this will be good enough, but if you
42 want to make really sure you got everything, use a small buffer around
43 your region.
44
45 By default no bounds will be set in the header of the output file. Use
46 the --set-bounds option if you need this.
47
48 Note that osmium extract will never clip any OSM objects, ie. it will
49 not remove node references outside the region from ways or unused rela‐
50 tion members from relations. This means you might get objects that are
51 not reference-complete. It has the advantage that you can use osmium
52 merge to merge several extracts without problems.
53
55 -b, --bbox=LONG1,LAT1,LONG2,LAT2
56 Set the bounding box to cut out. Can not be used with --poly‐
57 gon/-p, --config/-c, or --directory/-d. The coordinates
58 LONG1,LAT1 are from one arbitrary corner, the coordinates
59 LONG2,LAT2 are from the opposite corner.
60
61 -c, --config=FILE
62 Set the name of the config file. Can not be used with the
63 --bbox/-b or --polygon/-p option. If this is set, the --out‐
64 put/-o and --output-format/-f options are ignored, because they
65 are set in the config file.
66
67 -d, --directory=DIRECTORY
68 Output directory. Output file names in the config file are rel‐
69 ative to this directory. Overwrites the setting of the same
70 name in the config file. This option is ignored when the
71 --bbox/-b or --polygon/-p options are used, set the output di‐
72 rectory and name with the --output/-o option in that case.
73
74 -H, --with-history
75 Specify that the input file is a history file. The output
76 file(s) will also be history file(s).
77
78 -p, --polygon=POLYGON_FILE
79 Set the polygon to cut out based on the contents of the file.
80 The file has to be a GeoJSON, poly, or OSM file as described in
81 the (MULTI)POLYGON FILE FORMATS section. It has to have the
82 right suffix to be detected correctly. Can not be used with
83 --bbox/-b, --config/-c, or --directory/-d.
84
85 -s, --strategy=STRATEGY
86 Use the given strategy to extract the region. For possible val‐
87 ues and details see the STRATEGIES section. Default is “com‐
88 plete_ways”.
89
90 -S, --option=OPTION=VALUE
91 Set a named option for the strategy. If needed you can specify
92 this option multiple times to set several options.
93
94 --set-bounds
95 Set the bounds field in the header. The bounds are set to the
96 bbox or envelope of the polygon specified for the extract. Note
97 that strategies other than “simple” can put nodes outside those
98 bounds into the output file.
99
101 -h, --help
102 Show usage help.
103
104 -v, --verbose
105 Set verbose mode. The program will output information about
106 what it is doing to STDERR.
107
109 -F, --input-format=FORMAT
110 The format of the input file(s). Can be used to set the input
111 format if it can’t be autodetected from the file name(s). This
112 will set the format for all input files, there is no way to set
113 the format for some input files only. See osmium-file-for‐
114 mats(5) or the libosmium manual for details.
115
117 -f, --output-format=FORMAT
118 The format of the output file. Can be used to set the output
119 file format if it can’t be autodetected from the output file
120 name. See osmium-file-formats(5) or the libosmium manual for
121 details.
122
123 --fsync
124 Call fsync after writing the output file to force flushing buf‐
125 fers to disk.
126
127 --generator=NAME
128 The name and version of the program generating the output file.
129 It will be added to the header of the output file. Default is
130 “osmium/” and the version of osmium.
131
132 -o, --output=FILE
133 Name of the output file. Default is `-' (STDOUT).
134
135 -O, --overwrite
136 Allow an existing output file to be overwritten. Normally os‐
137 mium will refuse to write over an existing file.
138
139 --output-header=OPTION=VALUE
140 Add output header option. This command line option can be used
141 multiple times for different OPTIONs. See the libosmium manual
142 for a list of available header options. For some commands you
143 can use the special format “OPTION!” (ie. an exclamation mark
144 after the OPTION and no value set) to set the value to the same
145 as in the input file.
146
148 The config file mainly specifies the file names and the regions of the
149 extracts that should be created.
150
151 The config file is in JSON format. The top-level is an object which
152 contains at least an “extracts” array. It can also contain a “direc‐
153 tory” entry which names the directory where all the output files will
154 be created:
155
156 {
157 "extracts": [...],
158 "directory": "/tmp/"
159 }
160
161 The extracts array specifies the extracts that should be created. Each
162 item in the array is an object with at least a name “output” naming the
163 output file and a region defined in a “bbox”, “polygon” or “multipoly‐
164 gon” name. An optional “description” can be added, it will not be used
165 by the program but can help with documenting the file contents. You
166 can add an optional “output_format” if the format can not be detected
167 from the “output” file name. Run “osmium help file-formats” to get a
168 description of allowed formats.
169
170 The optional “output_header” allows you to set additional OSM file
171 header settings such as the “generator”. If you set the value of a
172 file header setting to null, the output header will be set to the same
173 header from the input file.
174
175 "extracts": [
176 {
177 "output": "hamburg.osm.pbf",
178 "output_format": "pbf",
179 "description": "optional description",
180 "bbox": ...
181 },
182 {
183 "output": "berlin.osm.pbf",
184 "description": "optional description",
185 "polygon": ...
186 },
187 {
188 "output": "munich.osm.pbf",
189 "output_header": {
190 "generator": "MyExtractor/1.0",
191 "osmosis_replication_timestamp": null
192 },
193 "description": "optional description",
194 "multipolygon": ...
195 }
196 ]
197
198 There are several formats for specifying the regions:
199
200 bbox:
201
202 A bounding box in one of two formats. The first is a simple array with
203 four real numbers, the first two specifying the coordinates of an arbi‐
204 trary corner, the second two specifying the coordinates of the opposite
205 corner.
206
207 {
208 "output": "munich.osm.pbf",
209 "description": "Bounding box specified in array format",
210 "bbox": [11.35, 48.05, 11.73, 48.25]
211 }
212
213 The second format uses an object instead of an array:
214
215 {
216 "output": "dresden.osm.pbf",
217 "description": "Bounding box specified in object format",
218 "bbox": {
219 "left": 13.57,
220 "right": 13.97,
221 "top": 51.18,
222 "bottom": 50.97
223 }
224 }
225
226 polygon:
227
228 A polygon, either specified inline in the config file or read from an
229 external file. See the (MULTI)POLYGON FILE FORMATS section for exter‐
230 nal files. If specified inline this is a nested array, the outer array
231 defining the polygon, the next array the rings and the innermost arrays
232 the coordinates. This format is the same as in GeoJSON files.
233
234 In this example there is only one outer ring:
235
236 "polygon": [[
237 [9.613465, 53.58071],
238 [9.647599, 53.59655],
239 [9.649288, 53.61059],
240 [9.613465, 53.58071]
241 ]]
242
243 In each ring, the last set of coordinates should be the same as the
244 first set, closing the ring.
245
246 multipolygon:
247
248 A multipolygon, either specified inline in the config file or read from
249 an external file. See the (MULTI)POLYGON FILE FORMATS section for ex‐
250 ternal files. If specified inline this is a nested array, the outer
251 array defining the multipolygon, the next array the polygons, the next
252 the rings and the innermost arrays the coordinates. This format is the
253 same as in GeoJSON files.
254
255 In this example there is one outer and one inner ring:
256
257 "multipolygon": [[[
258 [6.847, 50.987],
259 [6.910, 51.007],
260 [7.037, 50.953],
261 [6.967, 50.880],
262 [6.842, 50.925],
263 [6.847, 50.987]
264 ],[
265 [6.967, 50.954],
266 [6.969, 50.920],
267 [6.932, 50.928],
268 [6.934, 50.950],
269 [6.967, 50.954]
270 ]]]
271
272 In each ring, the last set of coordinates should be the same as the
273 first set, closing the ring.
274
275 Osmium must check each and every node in the input data and find out in
276 which bounding boxes or (multi)polygons this node is. This is very
277 cheap for bounding boxes, but more expensive for (multi)polygons. And
278 it becomes more expensive the more vertices the (multi)polyon has. Use
279 bounding boxes or simplified polygons where possible.
280
281 Note that bounding boxes or (multi)polygons are not allowed to span the
282 -180/180 degree line. If you need this, cut out the regions on each
283 side and use osmium merge to join the resulting files.
284
286 External files describing a (multi)polygon are specified in the config
287 file using the “file_name” and “file_type” properties on the “polygon”
288 or “multipolygon” object:
289
290 "polygon": {
291 "file_name": "berlin.geojson",
292 "file_type": "geojson"
293 }
294
295 If file names don’t start with a slash (/), they are interpreted rela‐
296 tive to the directory where the config file is. If the “file_type” is
297 missing, Osmium will try to autodetect it from the suffix of the
298 “file_name”.
299
300 The following file types are supported:
301
302 geojson
303 GeoJSON file containing exactly one Feature of type Polygon or
304 MultiPolygon, or a FeatureCollection with the first Feature of
305 type Polygon or MultiPolygon. Everything except the actual ge‐
306 ometry (of the first Feature) is ignored.
307
308 poly A poly file as described in https://wiki.open‐
309 streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format . This
310 wiki page also mentions several sources for such poly files.
311
312 osm An OSM file containing one or more multipolygon or boundary re‐
313 lation together with all the nodes and ways needed. Any OSM
314 file format (XML, PBF, ...) supported by Osmium can be used
315 here, but the correct suffix must be used, so the file format is
316 detected correctly. Files for this can easily be obtained by
317 searching for the area on OSM and then downloading the full re‐
318 lation using a URL like https://www.open‐
319 streetmap.org/api/0.6/relation/RELATION-ID/full . Or you can
320 use osmium getid -r to get a specific relation from an OSM file.
321 Note that both these approaches can get you very detailed bound‐
322 aries which can take quite a while to cut out. Consider simpli‐
323 fying the boundary before use.
324
325 If there are several (multi)polygons in a poly file or OSM file, they
326 will be merged. The (multi)polygons must not overlap, otherwise the
327 result is undefined.
328
330 osmium extract can use different strategies for creating the extracts.
331 Depending on the strategy different objects will end up in the ex‐
332 tracts. The strategies differ in how much memory they need and how of‐
333 ten they need to read the input file. The choice of strategy depends
334 on how you want to use the generated extracts and how much memory and
335 time you have.
336
337 The default strategy is complete_ways.
338
339 Strategy simple
340 Runs in a single pass. The extract will contain all nodes in‐
341 side the region and all ways referencing those nodes as well as
342 all relations referencing any nodes or ways already included.
343 Ways crossing the region boundary will not be reference-com‐
344 plete. Relations will not be reference-complete. This strategy
345 is fast, because it reads the input only once, but the result is
346 not enough for most use cases. It is the only strategy that
347 will work when reading from a socket or pipe. This strategy
348 will not work for history files.
349
350 Strategy complete_ways
351 Runs in two passes. The extract will contain all nodes inside
352 the region and all ways referencing those nodes as well as all
353 nodes referenced by those ways. The extract will also contain
354 all relations referenced by nodes inside the region or ways al‐
355 ready included and, recursively, their parent relations. The
356 ways are reference-complete, but the relations are not.
357
358 Strategy smart
359 Runs in three passes. The extract will contain all nodes inside
360 the region and all ways referencing those nodes as well as all
361 nodes referenced by those ways. The extract will also contain
362 all relations referenced by nodes inside the region or ways al‐
363 ready included and, recursively, their parent relations. The
364 extract will also contain all nodes and ways (and the nodes they
365 reference) referenced by relations tagged “type=multipolygon”
366 directly referencing any nodes in the region or ways referencing
367 nodes in the region. The ways are reference-complete, and all
368 multipolygon relations referencing nodes in the regions or ways
369 that have nodes in the region are reference-complete. Other re‐
370 lations are not reference-complete.
371
372 For the smart strategy you can change the types of relations that
373 should be reference-complete. Instead of just relations tagged
374 “type=multipolygon”, you can either get all relations (use “-S
375 types=any”) or give a list of types to the -S option: “-S types=multi‐
376 polygon,route”. Note that especially boundary relations can be huge,
377 so if you include them, be aware your result might be huge.
378
379 The smart strategy allows another option “-S complete-partial-rela‐
380 tions=X”. If this is set, all relations that have more than X percent
381 of their members already in the extract will have their full set of
382 members in the extract. So this allows completing almost complete re‐
383 lations. It can be useful for instance to make sure a boundary rela‐
384 tion is complete even if some of it is outside the polygon used for ex‐
385 traction.
386
388 osmium extract exits with exit code
389
390 0 if everything went alright,
391
392 1 if there was an error processing the data, or
393
394 2 if there was a problem with the command line arguments, config
395 file or polygon files.
396
398 Memory usage of osmium extract depends on the number of extracts and on
399 the strategy used. For the simple strategy it will at least be the
400 number of extracts times the highest node ID used divided by 8. For
401 the complete_ways twice that and for the smart strategy a bit more.
402
403 If you want to split a large file into many extracts, do this in sev‐
404 eral steps. First create several larger extracts and then split them
405 again and again into smaller pieces.
406
408 See the example config files in the extract-example-config directory.
409 To try it:
410
411 osmium extract -v -c extract-example-config/extracts.json \
412 germany-latest.osm.pbf
413
414 Extract the city of Karlsruhe using a boundary polygon:
415
416 osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
417 -o karlsruhe.osm.pbf
418
419 Extract the city of Munich using a bounding box:
420
421 osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
422 -o munich.osm.pbf
423
425 • osmium(1), osmium-file-formats(5), osmium-getid(1), osmium-merge(1)
426
427 • Osmium website (https://osmcode.org/osmium-tool/)
428
430 Copyright (C) 2013-2021 Jochen Topf <jochen@topf.org>.
431
432 License GPLv3+: GNU GPL version 3 or later <https://gnu.org/li‐
433 censes/gpl.html>. This is free software: you are free to change and
434 redistribute it. There is NO WARRANTY, to the extent permitted by law.
435
437 If you have any questions or want to report a bug, please go to
438 https://osmcode.org/contact.html
439
441 Jochen Topf <jochen@topf.org>.
442
443
444
445 1.13.1 OSMIUM-EXTRACT(1)