1OSMIUM-EXTRACT(1)                                            OSMIUM-EXTRACT(1)
2
3
4

NAME

6       osmium-extract - create geographical extracts from an OSM file
7

SYNOPSIS

9       osmium extract –config CONFIG-FILE [OPTIONS] OSM-FILE
10       osmium extract –bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11       osmium extract –polygon POLYGON-FILE [OPTIONS] OSM-FILE
12

DESCRIPTION

14       Create  geographical  extracts  from an OSM data file or an OSM history
15       file.  The region (geographical extent) can be given as a bounding  box
16       or as a (multi)polygon.
17
18       There are three ways of calling this command:
19
20       · Specify  a config file with the –config/-c option.  It can define any
21         number of regions you want to cut out.  See the CONFIG  FILE  section
22         for details.
23
24       · Specify a bounding box to cut out with the –bbox/-b option.
25
26       · Specify a (multi)polygon to cut out with the –polygon/-p option.
27
28       The  input  file  is  assumed  to  be ordered in the usual order: nodes
29       first, then ways, then relations.
30
31       If the --with-history option is used, the command will  work  correctly
32       for history files.  This currently works for the complete_ways strategy
33       only.  The simple or smart strategies do not work with  history  files.
34       A  history  extract  will  contain every version of all objects with at
35       least one version in the region.  Generating a history extract is some‐
36       what slower than a normal data extract.
37
38       Osmium will make sure that all nodes on the vertices of the boundary of
39       the region will be in the extract, but nodes that happen to be directly
40       on  the  boundary,  but  between  those  vertices,  might end up in the
41       extract or not.  In almost all cases this will be good enough,  but  if
42       you  want  to  make  really sure you got everything, use a small buffer
43       around your region.
44
45       By default no bounds will be set in the header of the output file.  Use
46       the –set-bounds option if you need this.
47
48       Note  that osmium extract will never clip any OSM objects, ie.  it will
49       not remove node references outside the region from ways or unused rela‐
50       tion members from relations.  This means you might get objects that are
51       not reference-complete.  It has the advantage that you can  use  osmium
52       merge to merge several extracts without problems.
53

OPTIONS

55       -b, –bbox=LONG1,LAT1,LONG2,LAT2
56              Set  the  bounding  box to cut out.  Can not be used with –poly‐
57              gon/-p,   –config/-c,   or   –directory/-d.    The   coordinates
58              LONG1,LAT1  are  from  one  arbitrary  corner,  the  coordinates
59              LONG2,LAT2 are from the opposite corner.
60
61       -c, –config=FILE
62              Set the name of the config file.   Can  not  be  used  with  the
63              –bbox/-b  or –polygon/-p option.  If this is set, the –output/-o
64              and –output-format/-f options are ignored, because they are  set
65              in the config file.
66
67       -d, –directory=DIRECTORY
68              Output directory.  Output file names in the config file are rel‐
69              ative to this directory.  Overwrites the  setting  of  the  same
70              name  in  the  config  file.   This  option  is ignored when the
71              –bbox/-b or –polygon/-p options are used, set the output  direc‐
72              tory and name with the –output/-o option in that case.
73
74       -H, –with-history
75              Specify  that  the  input  file  is  a history file.  The output
76              file(s) will also be history file(s).
77
78       -p, –polygon=POLYGON_FILE
79              Set the polygon to cut out based on the contents  of  the  file.
80              The  file has to be a GeoJSON, poly, or OSM file as described in
81              the (MULTI)POLYGON FILE FORMATS section.  It  has  to  have  the
82              right  suffix  to  be  detected correctly.  Can not be used with
83              –bbox/-b, –config/-c, or –directory/-d.
84
85       -s, –strategy=STRATEGY
86              Use the given strategy to extract the region.  For possible val‐
87              ues  and  details  see the STRATEGIES section.  Default is “com‐
88              plete_ways”.
89
90       -S, –option=OPTION=VALUE
91              Set a named option for the strategy.  If needed you can  specify
92              this option multiple times to set several options.
93
94       –set-bounds
95              Set  the  bounds field in the header.  The bounds are set to the
96              bbox or envelope of the polygon specified for the extract.  Note
97              that  strategies other than “simple” can put nodes outside those
98              bounds into the output file.
99

COMMON OPTIONS

101       -h, –help
102              Show usage help.
103
104       -v, –verbose
105              Set verbose mode.  The program  will  output  information  about
106              what it is doing to STDERR.
107

INPUT OPTIONS

109       -F, –input-format=FORMAT
110              The  format  of the input file(s).  Can be used to set the input
111              format if it can't be autodetected from the file name(s).   This
112              will  set the format for all input files, there is no way to set
113              the format for some  input  files  only.   See  osmium-file-for‐
114              mats(5) or the libosmium manual for details.
115

OUTPUT OPTIONS

117       -f, –output-format=FORMAT
118              The  format  of  the output file.  Can be used to set the output
119              file format if it can't be autodetected  from  the  output  file
120              name.   See  osmium-file-formats(5)  or the libosmium manual for
121              details.
122
123       –fsync Call fsync after writing the output file to force flushing  buf‐
124              fers to disk.
125
126       –generator=NAME
127              The  name and version of the program generating the output file.
128              It will be added to the header of the output file.   Default  is
129osmium/” and the version of osmium.
130
131       -o, –output=FILE
132              Name of the output file.  Default is `-' (STDOUT).
133
134       -O, –overwrite
135              Allow  an  existing  output  file  to  be overwritten.  Normally
136              osmium will refuse to write over an existing file.
137
138       –output-header=OPTION=VALUE
139              Add output header option.  This command line option can be  used
140              multiple  times for different OPTIONs.  See the libosmium manual
141              for a list of available header options.
142

CONFIG FILE

144       The config file mainly specifies the file names and the regions of  the
145       extracts that should be created.
146
147       The  config  file  is in JSON format.  The top-level is an object which
148       contains at least an “extracts” array.  It can also contain  a  “direc‐
149       tory”  entry  which names the directory where all the output files will
150       be created:
151
152              {
153                  "extracts": [...],
154                  "directory": "/tmp/"
155              }
156
157       The extracts array specifies the extracts that should be created.  Each
158       item in the array is an object with at least a name “output” naming the
159       output file and a region defined in a “bbox”, “polygon” or  “multipoly‐
160       gon” name.  An optional “description” can be added, it will not be used
161       by the program but can help with documenting the  file  contents.   You
162       can  add  an optional “output_format” if the format can not be detected
163       from the “output” file name.  Run “osmium help file-formats” to  get  a
164       description  of  allowed  formats.  The optional “output_header” allows
165       you to set additional OSM file header settings such as the “generator”.
166
167              "extracts": [
168                  {
169                      "output": "hamburg.osm.pbf",
170                      "output_format": "pbf",
171                      "description": "optional description",
172                      "bbox": ...
173                  },
174                  {
175                      "output": "berlin.osm.pbf",
176                      "description": "optional description",
177                      "polygon": ...
178                  },
179                  {
180                      "output": "munich.osm.pbf",
181                      "output_header": {
182                          "generator": "MyExtractor/1.0"
183                      },
184                      "description": "optional description",
185                      "multipolygon": ...
186                  }
187              ]
188
189       There are several formats for specifying the regions:
190
191       bbox:
192
193       A bounding box in one of two formats.  The first is a simple array with
194       four real numbers, the first two specifying the coordinates of an arbi‐
195       trary corner, the second two specifying the coordinates of the opposite
196       corner.
197
198              {
199                  "output": "munich.osm.pbf",
200                  "description": "Bounding box specified in array format",
201                  "bbox": [11.35, 48.05, 11.73, 48.25]
202              }
203
204       The second format uses an object instead of an array:
205
206              {
207                  "output": "dresden.osm.pbf",
208                  "description": "Bounding box specified in object format",
209                  "bbox": {
210                      "left": 13.57,
211                      "right": 13.97,
212                      "top": 51.18,
213                      "bottom": 50.97
214                  }
215              }
216
217       polygon:
218
219       A  polygon,  either specified inline in the config file or read from an
220       external file.  See the (MULTI)POLYGON FILE FORMATS section for  exter‐
221       nal files.  If specified inline this is a nested array, the outer array
222       defining the polygon, the next array the rings and the innermost arrays
223       the coordinates.  This format is the same as in GeoJSON files.
224
225       In this example there is only one outer ring:
226
227              "polygon": [[
228                  [9.613465, 53.58071],
229                  [9.647599, 53.59655],
230                  [9.649288, 53.61059],
231                  [9.613465, 53.58071]
232              ]]
233
234       In  each  ring,  the  last set of coordinates should be the same as the
235       first set, closing the ring.
236
237       multipolygon:
238
239       A multipolygon, either specified inline in the config file or read from
240       an  external  file.   See  the  (MULTI)POLYGON FILE FORMATS section for
241       external files.  If specified inline this is a nested array, the  outer
242       array  defining the multipolygon, the next array the polygons, the next
243       the rings and the innermost arrays the coordinates.  This format is the
244       same as in GeoJSON files.
245
246       In this example there is one outer and one inner ring:
247
248              "multipolygon": [[[
249                  [6.847, 50.987],
250                  [6.910, 51.007],
251                  [7.037, 50.953],
252                  [6.967, 50.880],
253                  [6.842, 50.925],
254                  [6.847, 50.987]
255              ],[
256                  [6.967, 50.954],
257                  [6.969, 50.920],
258                  [6.932, 50.928],
259                  [6.934, 50.950],
260                  [6.967, 50.954]
261              ]]]
262
263       In  each  ring,  the  last set of coordinates should be the same as the
264       first set, closing the ring.
265
266       Osmium must check each and every node in the input data and find out in
267       which  bounding  boxes  or  (multi)polygons this node is.  This is very
268       cheap for bounding boxes, but more expensive for (multi)polygons.   And
269       it becomes more expensive the more vertices the (multi)polyon has.  Use
270       bounding boxes or simplified polygons where possible.
271
272       Note that bounding boxes or (multi)polygons are not allowed to span the
273       -180/180  degree  line.   If you need this, cut out the regions on each
274       side and use osmium merge to join the resulting files.
275

(MULTI)POLYGON FILE FORMATS

277       External files describing a (multi)polygon are specified in the  config
278       file  using the “file_name” and “file_type” properties on the “polygon”
279       or “multipolygon” object:
280
281              "polygon": {
282                  "file_name": "berlin.geojson",
283                  "file_type": "geojson"
284              }
285
286       If file names don't start with a slash (/), they are interpreted  rela‐
287       tive  to the directory where the config file is.  If the “file_type” is
288       missing, Osmium will try to  autodetect  it  from  the  suffix  of  the
289       “file_name”.
290
291       The following file types are supported:
292
293       geojson
294              GeoJSON  file  containing exactly one Feature of type Polygon or
295              MultiPolygon, or a FeatureCollection with the first  Feature  of
296              type  Polygon  or  MultiPolygon.   Everything  except the actual
297              geometry (of the first Feature) is ignored.
298
299       poly   A    poly    file    as    described    in    https://wiki.open
300              streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format   .   This
301              wiki page also mentions several sources for such poly files.
302
303       osm    An OSM file containing one  or  more  multipolygon  or  boundary
304              relation  together  with all the nodes and ways needed.  Any OSM
305              file format (XML, PBF, ...) supported  by  Osmium  can  be  used
306              here, but the correct suffix must be used, so the file format is
307              detected correctly.  Files for this can easily  be  obtained  by
308              searching  for  the  area  on  OSM and then downloading the full
309              relation     using     a     URL     like      https://www.open
310              streetmap.org/api/0.6/relation/RELATION-ID/full  .   Or  you can
311              use osmium getid -r to get a specific relation from an OSM file.
312              Note that both these approaches can get you very detailed bound‐
313              aries which can take quite a while to cut out.  Consider simpli‐
314              fying the boundary before use.
315
316       If  there  are several (multi)polygons in a poly file or OSM file, they
317       will be merged.  The (multi)polygons must not  overlap,  otherwise  the
318       result is undefined.
319

STRATEGIES

321       osmium  extract can use different strategies for creating the extracts.
322       Depending on  the  strategy  different  objects  will  end  up  in  the
323       extracts.   The  strategies differ in how much memory they need and how
324       often they need to read the input file.  The choice of strategy depends
325       on  how  you want to use the generated extracts and how much memory and
326       time you have.
327
328       The default strategy is complete_ways.
329
330       Strategy simple
331              Runs in a single pass.   The  extract  will  contain  all  nodes
332              inside  the  region and all ways referencing those nodes as well
333              as all relations referencing any nodes or ways already included.
334              Ways  crossing  the  region  boundary will not be reference-com‐
335              plete.  Relations will not be reference-complete.  This strategy
336              is fast, because it reads the input only once, but the result is
337              not enough for most use cases.  It is  the  only  strategy  that
338              will  work  when  reading  from a socket or pipe.  This strategy
339              will not work for history files.
340
341       Strategy complete_ways
342              Runs in two passes.  The extract will contain all  nodes  inside
343              the  region  and all ways referencing those nodes as well as all
344              nodes referenced by those ways.  The extract will  also  contain
345              all  relations  referenced  by  nodes  inside the region or ways
346              already included and, recursively, their parent relations.   The
347              ways are reference-complete, but the relations are not.
348
349       Strategy smart
350              Runs in three passes.  The extract will contain all nodes inside
351              the region and all ways referencing those nodes as well  as  all
352              nodes  referenced  by those ways.  The extract will also contain
353              all relations referenced by nodes  inside  the  region  or  ways
354              already  included and, recursively, their parent relations.  The
355              extract will also contain all nodes and ways (and the nodes they
356              reference)  referenced  by  relations tagged “type=multipolygon”
357              directly referencing any nodes in the region or ways referencing
358              nodes  in  the region.  The ways are reference-complete, and all
359              multipolygon relations referencing nodes in the regions or  ways
360              that  have  nodes  in  the region are reference-complete.  Other
361              relations are not reference-complete.
362
363       For the smart strategy you can  change  the  types  of  relations  that
364       should   be  reference-complete.   Instead  of  just  relations  tagged
365       “type=multipolygon”,  you  can  either  get  all  relations  (use   “-S
366       types=any”)  or give a list of types to the -S option: “-S types=multi‐
367       polygon,route”.  Note that especially boundary relations can  be  huge,
368       so if you include them, be aware your result might be huge.
369
370       The  smart  strategy  allows  another option “-S complete-partial-rela‐
371       tions=X”.  If this is set, all relations that have more than X  percent
372       of  their  members  already  in the extract will have their full set of
373       members in the extract.  So  this  allows  completing  almost  complete
374       relations.  It can be useful for instance to make sure a boundary rela‐
375       tion is complete even if some of it is outside  the  polygon  used  for
376       extraction.
377

DIAGNOSTICS

379       osmium extract exits with exit code
380
381       0      if everything went alright,
382
383       1      if there was an error processing the data, or
384
385       2      if  there  was a problem with the command line arguments, config
386              file or polygon files.
387

MEMORY USAGE

389       Memory usage of osmium extract depends on the number of extracts and on
390       the  strategy  used.   For  the simple strategy it will at least be the
391       number of extracts times the highest node ID used divided  by  8.   For
392       the complete_ways twice that and for the smart strategy a bit more.
393
394       If  you  want to split a large file into many extracts, do this in sev‐
395       eral steps.  First create several larger extracts and then  split  them
396       again and again into smaller pieces.
397

EXAMPLES

399       See  the  example config files in the extract-example-config directory.
400       To try it:
401
402              osmium extract -v -c extract-example-config/extracts.json \
403                  germany-latest.osm.pbf
404
405       Extract the city of Karlsruhe using a boundary polygon:
406
407              osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
408                  -o karlsruhe.osm.pbf
409
410       Extract the city of Munich using a bounding box:
411
412              osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
413                  -o munich.osm.pbf
414

SEE ALSO

416       · osmium(1), osmium-file-formats(5), osmium-getid(1), osmium-merge(1)
417
418       · Osmium website (https://osmcode.org/osmium-tool/)
419
421       Copyright (C) 2013-2018 Jochen Topf <jochen@topf.org>.
422
423       License     GPLv3+:     GNU     GPL     version     3     or      later
424       <https://gnu.org/licenses/gpl.html>.   This  is  free software: you are
425       free to change and redistribute it.   There  is  NO  WARRANTY,  to  the
426       extent permitted by law.
427

CONTACT

429       If  you  have  any  questions  or  want  to  report a bug, please go to
430       https://osmcode.org/contact.html
431

AUTHORS

433       Jochen Topf <jochen@topf.org>.
434
435
436
437                                    1.10.0                   OSMIUM-EXTRACT(1)
Impressum