1OSMIUM-EXTRACT(1) OSMIUM-EXTRACT(1)
2
3
4
6 osmium-extract - create geographical extracts from an OSM file
7
9 osmium extract --config CONFIG-FILE [OPTIONS] OSM-FILE
10 osmium extract --bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11 osmium extract --polygon POLYGON-FILE [OPTIONS] OSM-FILE
12
14 Create geographical extracts from an OSM data file or an OSM history
15 file. The region (geographical extent) can be given as a bounding box
16 or as a (multi)polygon.
17
18 There are three ways of calling this command:
19
20 • Specify a config file with the --config/-c option. It can define any
21 number of regions you want to cut out. See the CONFIG FILE section
22 for details.
23
24 • Specify a bounding box to cut out with the --bbox/-b option.
25
26 • Specify a (multi)polygon to cut out with the --polygon/-p option.
27
28 The input file is assumed to be ordered in the usual order: nodes
29 first, then ways, then relations.
30
31 If the --with-history/-H option is used, the command will work cor‐
32 rectly for history files. This currently works for the complete_ways
33 strategy only. The simple or smart strategies do not work with history
34 files. A history extract will contain every version of all objects
35 with at least one version in the region. Generating a history extract
36 is somewhat slower than a normal data extract.
37
38 Osmium will make sure that all nodes on the vertices of the boundary of
39 the region will be in the extract, but nodes that happen to be directly
40 on the boundary, but between those vertices, might end up in the ex‐
41 tract or not. In almost all cases this will be good enough, but if you
42 want to make really sure you got everything, use a small buffer around
43 your region.
44
45 By default no bounds will be set in the header of the output file. Use
46 the --set-bounds option if you need this.
47
48 Note that osmium extract will never clip any OSM objects, ie. it will
49 not remove node references outside the region from ways or unused rela‐
50 tion members from relations. This means you might get objects that are
51 not reference-complete. It has the advantage that you can use osmium
52 merge to merge several extracts without problems.
53
55 -b, --bbox=LONG1,LAT1,LONG2,LAT2
56 Set the bounding box to cut out. Can not be used with --poly‐
57 gon/-p, --config/-c, or --directory/-d. The coordinates
58 LONG1,LAT1 are from one arbitrary corner, the coordinates
59 LONG2,LAT2 are from the opposite corner.
60
61 -c, --config=FILE
62 Set the name of the config file. Can not be used with the
63 --bbox/-b or --polygon/-p option. If this is set, the --out‐
64 put/-o and --output-format/-f options are ignored, because they
65 are set in the config file.
66
67 --clean=ATTR
68 Clean the attribute (version, timestamp, changeset, uid, user),
69 from the data before writing it out again. The attribute will
70 be set to 0 (the user will be set to the empty string). This
71 option can be given multiple times. Depending on the output
72 format these attributes might show up as 0 or not show up at
73 all.
74
75 -d, --directory=DIRECTORY
76 Output directory. Output file names in the config file are rel‐
77 ative to this directory. Overwrites the setting of the same
78 name in the config file. This option is ignored when the
79 --bbox/-b or --polygon/-p options are used, set the output di‐
80 rectory and name with the --output/-o option in that case.
81
82 -H, --with-history
83 Specify that the input file is a history file. The output
84 file(s) will also be history file(s).
85
86 -p, --polygon=POLYGON_FILE
87 Set the polygon to cut out based on the contents of the file.
88 The file has to be a GeoJSON, poly, or OSM file as described in
89 the (MULTI)POLYGON FILE FORMATS section. It has to have the
90 right suffix to be detected correctly. Can not be used with
91 --bbox/-b, --config/-c, or --directory/-d.
92
93 -s, --strategy=STRATEGY
94 Use the given strategy to extract the region. For possible val‐
95 ues and details see the STRATEGIES section. Default is “com‐
96 plete_ways”.
97
98 -S, --option=OPTION=VALUE
99 Set a named option for the strategy. If needed you can specify
100 this option multiple times to set several options.
101
102 --set-bounds
103 Set the bounds field in the header. The bounds are set to the
104 bbox or envelope of the polygon specified for the extract. Note
105 that strategies other than “simple” can put nodes outside those
106 bounds into the output file.
107
109 -h, --help
110 Show usage help.
111
112 -v, --verbose
113 Set verbose mode. The program will output information about
114 what it is doing to STDERR.
115
117 -F, --input-format=FORMAT
118 The format of the input file(s). Can be used to set the input
119 format if it can’t be autodetected from the file name(s). This
120 will set the format for all input files, there is no way to set
121 the format for some input files only. See osmium-file-for‐
122 mats(5) or the libosmium manual for details.
123
125 -f, --output-format=FORMAT
126 The format of the output file. Can be used to set the output
127 file format if it can’t be autodetected from the output file
128 name. See osmium-file-formats(5) or the libosmium manual for
129 details.
130
131 --fsync
132 Call fsync after writing the output file to force flushing buf‐
133 fers to disk.
134
135 --generator=NAME
136 The name and version of the program generating the output file.
137 It will be added to the header of the output file. Default is
138 “osmium/” and the version of osmium.
139
140 -o, --output=FILE
141 Name of the output file. Default is `-' (STDOUT).
142
143 -O, --overwrite
144 Allow an existing output file to be overwritten. Normally os‐
145 mium will refuse to write over an existing file.
146
147 --output-header=OPTION=VALUE
148 Add output header option. This command line option can be used
149 multiple times for different OPTIONs. See the osmium-output-
150 headers(5) man page for a list of available header options. For
151 some commands you can use the special format “OPTION!” (ie. an
152 exclamation mark after the OPTION and no value set) to set the
153 value to the same as in the input file.
154
156 The config file mainly specifies the file names and the regions of the
157 extracts that should be created.
158
159 The config file is in JSON format. The top-level is an object which
160 contains at least an “extracts” array. It can also contain a “direc‐
161 tory” entry which names the directory where all the output files will
162 be created:
163
164 {
165 "extracts": [...],
166 "directory": "/tmp/"
167 }
168
169 The extracts array specifies the extracts that should be created. Each
170 item in the array is an object with at least a name “output” naming the
171 output file and a region defined in a “bbox”, “polygon” or “multipoly‐
172 gon” name. An optional “description” can be added, it will not be used
173 by the program but can help with documenting the file contents. You
174 can add an optional “output_format” if the format can not be detected
175 from the “output” file name. Run “osmium help file-formats” to get a
176 description of allowed formats.
177
178 The optional “output_header” allows you to set additional OSM file
179 header settings such as the “generator”. If you set the value of a
180 file header setting to null, the output header will be set to the same
181 header from the input file.
182
183 "extracts": [
184 {
185 "output": "hamburg.osm.pbf",
186 "output_format": "pbf",
187 "description": "optional description",
188 "bbox": ...
189 },
190 {
191 "output": "berlin.osm.pbf",
192 "description": "optional description",
193 "polygon": ...
194 },
195 {
196 "output": "munich.osm.pbf",
197 "output_header": {
198 "generator": "MyExtractor/1.0",
199 "osmosis_replication_timestamp": null
200 },
201 "description": "optional description",
202 "multipolygon": ...
203 }
204 ]
205
206 There are several formats for specifying the regions:
207
208 bbox:
209
210 A bounding box in one of two formats. The first is a simple array with
211 four real numbers, the first two specifying the coordinates of an arbi‐
212 trary corner, the second two specifying the coordinates of the opposite
213 corner.
214
215 {
216 "output": "munich.osm.pbf",
217 "description": "Bounding box specified in array format",
218 "bbox": [11.35, 48.05, 11.73, 48.25]
219 }
220
221 The second format uses an object instead of an array:
222
223 {
224 "output": "dresden.osm.pbf",
225 "description": "Bounding box specified in object format",
226 "bbox": {
227 "left": 13.57,
228 "right": 13.97,
229 "top": 51.18,
230 "bottom": 50.97
231 }
232 }
233
234 polygon:
235
236 A polygon, either specified inline in the config file or read from an
237 external file. See the (MULTI)POLYGON FILE FORMATS section for exter‐
238 nal files. If specified inline this is a nested array, the outer array
239 defining the polygon, the next array the rings and the innermost arrays
240 the coordinates. This format is the same as in GeoJSON files.
241
242 In this example there is only one outer ring:
243
244 "polygon": [[
245 [9.613465, 53.58071],
246 [9.647599, 53.59655],
247 [9.649288, 53.61059],
248 [9.613465, 53.58071]
249 ]]
250
251 In each ring, the last set of coordinates should be the same as the
252 first set, closing the ring.
253
254 multipolygon:
255
256 A multipolygon, either specified inline in the config file or read from
257 an external file. See the (MULTI)POLYGON FILE FORMATS section for ex‐
258 ternal files. If specified inline this is a nested array, the outer
259 array defining the multipolygon, the next array the polygons, the next
260 the rings and the innermost arrays the coordinates. This format is the
261 same as in GeoJSON files.
262
263 In this example there is one outer and one inner ring:
264
265 "multipolygon": [[[
266 [6.847, 50.987],
267 [6.910, 51.007],
268 [7.037, 50.953],
269 [6.967, 50.880],
270 [6.842, 50.925],
271 [6.847, 50.987]
272 ],[
273 [6.967, 50.954],
274 [6.969, 50.920],
275 [6.932, 50.928],
276 [6.934, 50.950],
277 [6.967, 50.954]
278 ]]]
279
280 In each ring, the last set of coordinates should be the same as the
281 first set, closing the ring.
282
283 Osmium must check each and every node in the input data and find out in
284 which bounding boxes or (multi)polygons this node is. This is very
285 cheap for bounding boxes, but more expensive for (multi)polygons. And
286 it becomes more expensive the more vertices the (multi)polyon has. Use
287 bounding boxes or simplified polygons where possible.
288
289 Note that bounding boxes or (multi)polygons are not allowed to span the
290 -180/180 degree line. If you need this, cut out the regions on each
291 side and use osmium merge to join the resulting files.
292
294 External files describing a (multi)polygon are specified in the config
295 file using the “file_name” and “file_type” properties on the “polygon”
296 or “multipolygon” object:
297
298 "polygon": {
299 "file_name": "berlin.geojson",
300 "file_type": "geojson"
301 }
302
303 If file names don’t start with a slash (/), they are interpreted rela‐
304 tive to the directory where the config file is. If the “file_type” is
305 missing, Osmium will try to autodetect it from the suffix of the
306 “file_name”.
307
308 The following file types are supported:
309
310 geojson
311 GeoJSON file containing exactly one Feature of type Polygon or
312 MultiPolygon, or a FeatureCollection with the first Feature of
313 type Polygon or MultiPolygon. Everything except the actual ge‐
314 ometry (of the first Feature) is ignored.
315
316 poly A poly file as described in https://wiki.open‐
317 streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format . This
318 wiki page also mentions several sources for such poly files.
319
320 osm An OSM file containing one or more multipolygon or boundary re‐
321 lation together with all the nodes and ways needed. Any OSM
322 file format (XML, PBF, ...) supported by Osmium can be used
323 here, but the correct suffix must be used, so the file format is
324 detected correctly. Files for this can easily be obtained by
325 searching for the area on OSM and then downloading the full re‐
326 lation using a URL like https://www.open‐
327 streetmap.org/api/0.6/relation/RELATION-ID/full . Or you can
328 use osmium getid -r to get a specific relation from an OSM file.
329 Note that both these approaches can get you very detailed bound‐
330 aries which can take quite a while to cut out. Consider simpli‐
331 fying the boundary before use.
332
333 If there are several (multi)polygons in a poly file or OSM file, they
334 will be merged. The (multi)polygons must not overlap, otherwise the
335 result is undefined.
336
338 osmium extract can use different strategies for creating the extracts.
339 Depending on the strategy different objects will end up in the ex‐
340 tracts. The strategies differ in how much memory they need and how of‐
341 ten they need to read the input file. The choice of strategy depends
342 on how you want to use the generated extracts and how much memory and
343 time you have.
344
345 The default strategy is complete_ways.
346
347 Strategy simple
348 Runs in a single pass. The extract will contain all nodes in‐
349 side the region and all ways referencing those nodes as well as
350 all relations referencing any nodes or ways already included.
351 Ways crossing the region boundary will not be reference-com‐
352 plete. Relations will not be reference-complete. This strategy
353 is fast, because it reads the input only once, but the result is
354 not enough for most use cases. It is the only strategy that
355 will work when reading from a socket or pipe. This strategy
356 will not work for history files.
357
358 Strategy complete_ways
359 Runs in two passes. The extract will contain all nodes inside
360 the region and all ways referencing those nodes as well as all
361 nodes referenced by those ways. The extract will also contain
362 all relations referenced by nodes inside the region or ways al‐
363 ready included and, recursively, their parent relations. The
364 ways are reference-complete, but the relations are not.
365
366 Strategy smart
367 Runs in three passes. The extract will contain all nodes inside
368 the region and all ways referencing those nodes as well as all
369 nodes referenced by those ways. The extract will also contain
370 all relations referenced by nodes inside the region or ways al‐
371 ready included and, recursively, their parent relations. The
372 extract will also contain all nodes and ways (and the nodes they
373 reference) referenced by relations tagged “type=multipolygon”
374 directly referencing any nodes in the region or ways referencing
375 nodes in the region. The ways are reference-complete, and all
376 multipolygon relations referencing nodes in the regions or ways
377 that have nodes in the region are reference-complete. Other re‐
378 lations are not reference-complete.
379
380 For the complete_ways strategy you can set the option “-S rela‐
381 tions=false” in which case no relations will be written to the output
382 file.
383
384 For the smart strategy you can change the types of relations that
385 should be reference-complete. Instead of just relations tagged
386 “type=multipolygon”, you can either get all relations (use “-S
387 types=any”) or give a list of types to the -S option: “-S types=multi‐
388 polygon,route”. Note that especially boundary relations can be huge,
389 so if you include them, be aware your result might be huge.
390
391 The smart strategy allows another option “-S complete-partial-rela‐
392 tions=X”. If this is set, all relations that have more than X percent
393 of their members already in the extract will have their full set of
394 members in the extract. So this allows completing almost complete re‐
395 lations. It can be useful for instance to make sure a boundary rela‐
396 tion is complete even if some of it is outside the polygon used for ex‐
397 traction.
398
400 osmium extract exits with exit code
401
402 0 if everything went alright,
403
404 1 if there was an error processing the data, or
405
406 2 if there was a problem with the command line arguments, config
407 file or polygon files.
408
410 Memory usage of osmium extract depends on the number of extracts and on
411 the strategy used. For the simple strategy it will at least be the
412 number of extracts times the highest node ID used divided by 8. For
413 the complete_ways twice that and for the smart strategy a bit more.
414
415 If you want to split a large file into many extracts, do this in sev‐
416 eral steps. First create several larger extracts and then split them
417 again and again into smaller pieces.
418
420 See the example config files in the extract-example-config directory.
421 To try it:
422
423 osmium extract -v -c extract-example-config/extracts.json \
424 germany-latest.osm.pbf
425
426 Extract the city of Karlsruhe using a boundary polygon:
427
428 osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
429 -o karlsruhe.osm.pbf
430
431 Extract the city of Munich using a bounding box:
432
433 osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
434 -o munich.osm.pbf
435
437 • osmium(1), osmium-file-formats(5), osmium-output-headers(5), osmium-
438 getid(1), osmium-merge(1)
439
440 • Osmium website (https://osmcode.org/osmium-tool/)
441
443 Copyright (C) 2013-2022 Jochen Topf <jochen@topf.org>.
444
445 License GPLv3+: GNU GPL version 3 or later <https://gnu.org/li‐
446 censes/gpl.html>. This is free software: you are free to change and
447 redistribute it. There is NO WARRANTY, to the extent permitted by law.
448
450 If you have any questions or want to report a bug, please go to
451 https://osmcode.org/contact.html
452
454 Jochen Topf <jochen@topf.org>.
455
456
457
458 1.14.0 OSMIUM-EXTRACT(1)