1OSMIUM-EXTRACT(1) OSMIUM-EXTRACT(1)
2
3
4
6 osmium-extract - create geographical extracts from an OSM file
7
9 osmium extract --config CONFIG-FILE [OPTIONS] OSM-FILE
10 osmium extract --bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11 osmium extract --polygon POLYGON-FILE [OPTIONS] OSM-FILE
12
14 Create geographical extracts from an OSM data file or an OSM history
15 file. The region (geographical extent) can be given as a bounding box
16 or as a (multi)polygon.
17
18 There are three ways of calling this command:
19
20 · Specify a config file with the --config/-c option. It can define any
21 number of regions you want to cut out. See the CONFIG FILE section
22 for details.
23
24 · Specify a bounding box to cut out with the --bbox/-b option.
25
26 · Specify a (multi)polygon to cut out with the --polygon/-p option.
27
28 The input file is assumed to be ordered in the usual order: nodes
29 first, then ways, then relations.
30
31 If the --with-history/-H option is used, the command will work cor‐
32 rectly for history files. This currently works for the complete_ways
33 strategy only. The simple or smart strategies do not work with history
34 files. A history extract will contain every version of all objects
35 with at least one version in the region. Generating a history extract
36 is somewhat slower than a normal data extract.
37
38 Osmium will make sure that all nodes on the vertices of the boundary of
39 the region will be in the extract, but nodes that happen to be directly
40 on the boundary, but between those vertices, might end up in the
41 extract or not. In almost all cases this will be good enough, but if
42 you want to make really sure you got everything, use a small buffer
43 around your region.
44
45 By default no bounds will be set in the header of the output file. Use
46 the --set-bounds option if you need this.
47
48 Note that osmium extract will never clip any OSM objects, ie. it will
49 not remove node references outside the region from ways or unused rela‐
50 tion members from relations. This means you might get objects that are
51 not reference-complete. It has the advantage that you can use osmium
52 merge to merge several extracts without problems.
53
55 -b, --bbox=LONG1,LAT1,LONG2,LAT2
56 Set the bounding box to cut out. Can not be used with --poly‐
57 gon/-p, --config/-c, or --directory/-d. The coordinates
58 LONG1,LAT1 are from one arbitrary corner, the coordinates
59 LONG2,LAT2 are from the opposite corner.
60
61 -c, --config=FILE
62 Set the name of the config file. Can not be used with the
63 --bbox/-b or --polygon/-p option. If this is set, the --out‐
64 put/-o and --output-format/-f options are ignored, because they
65 are set in the config file.
66
67 -d, --directory=DIRECTORY
68 Output directory. Output file names in the config file are rel‐
69 ative to this directory. Overwrites the setting of the same
70 name in the config file. This option is ignored when the
71 --bbox/-b or --polygon/-p options are used, set the output
72 directory and name with the --output/-o option in that case.
73
74 -H, --with-history
75 Specify that the input file is a history file. The output
76 file(s) will also be history file(s).
77
78 -p, --polygon=POLYGON_FILE
79 Set the polygon to cut out based on the contents of the file.
80 The file has to be a GeoJSON, poly, or OSM file as described in
81 the (MULTI)POLYGON FILE FORMATS section. It has to have the
82 right suffix to be detected correctly. Can not be used with
83 --bbox/-b, --config/-c, or --directory/-d.
84
85 -s, --strategy=STRATEGY
86 Use the given strategy to extract the region. For possible val‐
87 ues and details see the STRATEGIES section. Default is “com‐
88 plete_ways”.
89
90 -S, --option=OPTION=VALUE
91 Set a named option for the strategy. If needed you can specify
92 this option multiple times to set several options.
93
94 --set-bounds
95 Set the bounds field in the header. The bounds are set to the
96 bbox or envelope of the polygon specified for the extract. Note
97 that strategies other than “simple” can put nodes outside those
98 bounds into the output file.
99
101 -h, --help
102 Show usage help.
103
104 -v, --verbose
105 Set verbose mode. The program will output information about
106 what it is doing to STDERR.
107
109 -F, --input-format=FORMAT
110 The format of the input file(s). Can be used to set the input
111 format if it can’t be autodetected from the file name(s). This
112 will set the format for all input files, there is no way to set
113 the format for some input files only. See osmium-file-for‐
114 mats(5) or the libosmium manual for details.
115
117 -f, --output-format=FORMAT
118 The format of the output file. Can be used to set the output
119 file format if it can’t be autodetected from the output file
120 name. See osmium-file-formats(5) or the libosmium manual for
121 details.
122
123 --fsync
124 Call fsync after writing the output file to force flushing buf‐
125 fers to disk.
126
127 --generator=NAME
128 The name and version of the program generating the output file.
129 It will be added to the header of the output file. Default is
130 “osmium/” and the version of osmium.
131
132 -o, --output=FILE
133 Name of the output file. Default is `-' (STDOUT).
134
135 -O, --overwrite
136 Allow an existing output file to be overwritten. Normally
137 osmium will refuse to write over an existing file.
138
139 --output-header=OPTION=VALUE
140 Add output header option. This command line option can be used
141 multiple times for different OPTIONs. See the libosmium manual
142 for a list of available header options. For some commands you
143 can use the special format “OPTION!” (ie. an exclamation mark
144 after the OPTION and no value set) to set the value to the same
145 as in the input file.
146
148 The config file mainly specifies the file names and the regions of the
149 extracts that should be created.
150
151 The config file is in JSON format. The top-level is an object which
152 contains at least an “extracts” array. It can also contain a “direc‐
153 tory” entry which names the directory where all the output files will
154 be created:
155
156 {
157 "extracts": [...],
158 "directory": "/tmp/"
159 }
160
161 The extracts array specifies the extracts that should be created. Each
162 item in the array is an object with at least a name “output” naming the
163 output file and a region defined in a “bbox”, “polygon” or “multipoly‐
164 gon” name. An optional “description” can be added, it will not be used
165 by the program but can help with documenting the file contents. You
166 can add an optional “output_format” if the format can not be detected
167 from the “output” file name. Run “osmium help file-formats” to get a
168 description of allowed formats. The optional “output_header” allows
169 you to set additional OSM file header settings such as the “generator”.
170
171 "extracts": [
172 {
173 "output": "hamburg.osm.pbf",
174 "output_format": "pbf",
175 "description": "optional description",
176 "bbox": ...
177 },
178 {
179 "output": "berlin.osm.pbf",
180 "description": "optional description",
181 "polygon": ...
182 },
183 {
184 "output": "munich.osm.pbf",
185 "output_header": {
186 "generator": "MyExtractor/1.0"
187 },
188 "description": "optional description",
189 "multipolygon": ...
190 }
191 ]
192
193 There are several formats for specifying the regions:
194
195 bbox:
196
197 A bounding box in one of two formats. The first is a simple array with
198 four real numbers, the first two specifying the coordinates of an arbi‐
199 trary corner, the second two specifying the coordinates of the opposite
200 corner.
201
202 {
203 "output": "munich.osm.pbf",
204 "description": "Bounding box specified in array format",
205 "bbox": [11.35, 48.05, 11.73, 48.25]
206 }
207
208 The second format uses an object instead of an array:
209
210 {
211 "output": "dresden.osm.pbf",
212 "description": "Bounding box specified in object format",
213 "bbox": {
214 "left": 13.57,
215 "right": 13.97,
216 "top": 51.18,
217 "bottom": 50.97
218 }
219 }
220
221 polygon:
222
223 A polygon, either specified inline in the config file or read from an
224 external file. See the (MULTI)POLYGON FILE FORMATS section for exter‐
225 nal files. If specified inline this is a nested array, the outer array
226 defining the polygon, the next array the rings and the innermost arrays
227 the coordinates. This format is the same as in GeoJSON files.
228
229 In this example there is only one outer ring:
230
231 "polygon": [[
232 [9.613465, 53.58071],
233 [9.647599, 53.59655],
234 [9.649288, 53.61059],
235 [9.613465, 53.58071]
236 ]]
237
238 In each ring, the last set of coordinates should be the same as the
239 first set, closing the ring.
240
241 multipolygon:
242
243 A multipolygon, either specified inline in the config file or read from
244 an external file. See the (MULTI)POLYGON FILE FORMATS section for
245 external files. If specified inline this is a nested array, the outer
246 array defining the multipolygon, the next array the polygons, the next
247 the rings and the innermost arrays the coordinates. This format is the
248 same as in GeoJSON files.
249
250 In this example there is one outer and one inner ring:
251
252 "multipolygon": [[[
253 [6.847, 50.987],
254 [6.910, 51.007],
255 [7.037, 50.953],
256 [6.967, 50.880],
257 [6.842, 50.925],
258 [6.847, 50.987]
259 ],[
260 [6.967, 50.954],
261 [6.969, 50.920],
262 [6.932, 50.928],
263 [6.934, 50.950],
264 [6.967, 50.954]
265 ]]]
266
267 In each ring, the last set of coordinates should be the same as the
268 first set, closing the ring.
269
270 Osmium must check each and every node in the input data and find out in
271 which bounding boxes or (multi)polygons this node is. This is very
272 cheap for bounding boxes, but more expensive for (multi)polygons. And
273 it becomes more expensive the more vertices the (multi)polyon has. Use
274 bounding boxes or simplified polygons where possible.
275
276 Note that bounding boxes or (multi)polygons are not allowed to span the
277 -180/180 degree line. If you need this, cut out the regions on each
278 side and use osmium merge to join the resulting files.
279
281 External files describing a (multi)polygon are specified in the config
282 file using the “file_name” and “file_type” properties on the “polygon”
283 or “multipolygon” object:
284
285 "polygon": {
286 "file_name": "berlin.geojson",
287 "file_type": "geojson"
288 }
289
290 If file names don’t start with a slash (/), they are interpreted rela‐
291 tive to the directory where the config file is. If the “file_type” is
292 missing, Osmium will try to autodetect it from the suffix of the
293 “file_name”.
294
295 The following file types are supported:
296
297 geojson
298 GeoJSON file containing exactly one Feature of type Polygon or
299 MultiPolygon, or a FeatureCollection with the first Feature of
300 type Polygon or MultiPolygon. Everything except the actual
301 geometry (of the first Feature) is ignored.
302
303 poly A poly file as described in https://wiki.open‐
304 streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format . This
305 wiki page also mentions several sources for such poly files.
306
307 osm An OSM file containing one or more multipolygon or boundary
308 relation together with all the nodes and ways needed. Any OSM
309 file format (XML, PBF, ...) supported by Osmium can be used
310 here, but the correct suffix must be used, so the file format is
311 detected correctly. Files for this can easily be obtained by
312 searching for the area on OSM and then downloading the full
313 relation using a URL like https://www.open‐
314 streetmap.org/api/0.6/relation/RELATION-ID/full . Or you can
315 use osmium getid -r to get a specific relation from an OSM file.
316 Note that both these approaches can get you very detailed bound‐
317 aries which can take quite a while to cut out. Consider simpli‐
318 fying the boundary before use.
319
320 If there are several (multi)polygons in a poly file or OSM file, they
321 will be merged. The (multi)polygons must not overlap, otherwise the
322 result is undefined.
323
325 osmium extract can use different strategies for creating the extracts.
326 Depending on the strategy different objects will end up in the
327 extracts. The strategies differ in how much memory they need and how
328 often they need to read the input file. The choice of strategy depends
329 on how you want to use the generated extracts and how much memory and
330 time you have.
331
332 The default strategy is complete_ways.
333
334 Strategy simple
335 Runs in a single pass. The extract will contain all nodes
336 inside the region and all ways referencing those nodes as well
337 as all relations referencing any nodes or ways already included.
338 Ways crossing the region boundary will not be reference-com‐
339 plete. Relations will not be reference-complete. This strategy
340 is fast, because it reads the input only once, but the result is
341 not enough for most use cases. It is the only strategy that
342 will work when reading from a socket or pipe. This strategy
343 will not work for history files.
344
345 Strategy complete_ways
346 Runs in two passes. The extract will contain all nodes inside
347 the region and all ways referencing those nodes as well as all
348 nodes referenced by those ways. The extract will also contain
349 all relations referenced by nodes inside the region or ways
350 already included and, recursively, their parent relations. The
351 ways are reference-complete, but the relations are not.
352
353 Strategy smart
354 Runs in three passes. The extract will contain all nodes inside
355 the region and all ways referencing those nodes as well as all
356 nodes referenced by those ways. The extract will also contain
357 all relations referenced by nodes inside the region or ways
358 already included and, recursively, their parent relations. The
359 extract will also contain all nodes and ways (and the nodes they
360 reference) referenced by relations tagged “type=multipolygon”
361 directly referencing any nodes in the region or ways referencing
362 nodes in the region. The ways are reference-complete, and all
363 multipolygon relations referencing nodes in the regions or ways
364 that have nodes in the region are reference-complete. Other
365 relations are not reference-complete.
366
367 For the smart strategy you can change the types of relations that
368 should be reference-complete. Instead of just relations tagged
369 “type=multipolygon”, you can either get all relations (use “-S
370 types=any”) or give a list of types to the -S option: “-S types=multi‐
371 polygon,route”. Note that especially boundary relations can be huge,
372 so if you include them, be aware your result might be huge.
373
374 The smart strategy allows another option “-S complete-partial-rela‐
375 tions=X”. If this is set, all relations that have more than X percent
376 of their members already in the extract will have their full set of
377 members in the extract. So this allows completing almost complete
378 relations. It can be useful for instance to make sure a boundary rela‐
379 tion is complete even if some of it is outside the polygon used for
380 extraction.
381
383 osmium extract exits with exit code
384
385 0 if everything went alright,
386
387 1 if there was an error processing the data, or
388
389 2 if there was a problem with the command line arguments, config
390 file or polygon files.
391
393 Memory usage of osmium extract depends on the number of extracts and on
394 the strategy used. For the simple strategy it will at least be the
395 number of extracts times the highest node ID used divided by 8. For
396 the complete_ways twice that and for the smart strategy a bit more.
397
398 If you want to split a large file into many extracts, do this in sev‐
399 eral steps. First create several larger extracts and then split them
400 again and again into smaller pieces.
401
403 See the example config files in the extract-example-config directory.
404 To try it:
405
406 osmium extract -v -c extract-example-config/extracts.json \
407 germany-latest.osm.pbf
408
409 Extract the city of Karlsruhe using a boundary polygon:
410
411 osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
412 -o karlsruhe.osm.pbf
413
414 Extract the city of Munich using a bounding box:
415
416 osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
417 -o munich.osm.pbf
418
420 · osmium(1), osmium-file-formats(5), osmium-getid(1), osmium-merge(1)
421
422 · Osmium website (https://osmcode.org/osmium-tool/)
423
425 Copyright (C) 2013-2020 Jochen Topf <jochen@topf.org>.
426
427 License GPLv3+: GNU GPL version 3 or later
428 <https://gnu.org/licenses/gpl.html>. This is free software: you are
429 free to change and redistribute it. There is NO WARRANTY, to the
430 extent permitted by law.
431
433 If you have any questions or want to report a bug, please go to
434 https://osmcode.org/contact.html
435
437 Jochen Topf <jochen@topf.org>.
438
439
440
441 1.12.1 OSMIUM-EXTRACT(1)