1OSMIUM-EXTRACT(1)                                            OSMIUM-EXTRACT(1)
2
3
4

NAME

6       osmium-extract - create geographical extracts from an OSM file
7

SYNOPSIS

9       osmium extract --config CONFIG-FILE [OPTIONS] OSM-FILE
10       osmium extract --bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11       osmium extract --polygon POLYGON-FILE [OPTIONS] OSM-FILE
12

DESCRIPTION

14       Create  geographical  extracts  from an OSM data file or an OSM history
15       file.  The region (geographical extent) can be given as a bounding  box
16       or as a (multi)polygon.
17
18       There are three ways of calling this command:
19
20       • Specify a config file with the --config/-c option.  It can define any
21         number of regions you want to cut out.  See the CONFIG  FILE  section
22         for details.
23
24       • Specify a bounding box to cut out with the --bbox/-b option.
25
26       • Specify a (multi)polygon to cut out with the --polygon/-p option.
27
28       The  input  file  is  assumed  to  be ordered in the usual order: nodes
29       first, then ways, then relations.
30
31       If the --with-history/-H option is used, the  command  will  work  cor‐
32       rectly  for  history files.  This currently works for the complete_ways
33       strategy only.  The simple or smart strategies do not work with history
34       files.   A  history  extract  will contain every version of all objects
35       with at least one version in the region.  Generating a history  extract
36       is somewhat slower than a normal data extract.
37
38       Osmium will make sure that all nodes on the vertices of the boundary of
39       the region will be in the extract, but nodes that happen to be directly
40       on  the  boundary,  but between those vertices, might end up in the ex‐
41       tract or not.  In almost all cases this will be good enough, but if you
42       want  to make really sure you got everything, use a small buffer around
43       your region.
44
45       By default no bounds will be set in the header of the output file.  Use
46       the --set-bounds option if you need this.
47
48       Note  that osmium extract will never clip any OSM objects, ie.  it will
49       not remove node references outside the region from ways or unused rela‐
50       tion members from relations.  This means you might get objects that are
51       not reference-complete.  It has the advantage that you can  use  osmium
52       merge to merge several extracts without problems.
53

OPTIONS

55       -b, --bbox=LONG1,LAT1,LONG2,LAT2
56              Set  the  bounding box to cut out.  Can not be used with --poly‐
57              gon/-p,  --config/-c,  or   --directory/-d.    The   coordinates
58              LONG1,LAT1  are  from  one  arbitrary  corner,  the  coordinates
59              LONG2,LAT2 are from the opposite corner.
60
61       -c, --config=FILE
62              Set the name of the config file.   Can  not  be  used  with  the
63              --bbox/-b  or  --polygon/-p  option.  If this is set, the --out‐
64              put/-o and --output-format/-f options are ignored, because  they
65              are set in the config file.
66
67       -d, --directory=DIRECTORY
68              Output directory.  Output file names in the config file are rel‐
69              ative to this directory.  Overwrites the  setting  of  the  same
70              name  in  the  config  file.   This  option  is ignored when the
71              --bbox/-b or --polygon/-p options are used, set the  output  di‐
72              rectory and name with the --output/-o option in that case.
73
74       -H, --with-history
75              Specify  that  the  input  file  is  a history file.  The output
76              file(s) will also be history file(s).
77
78       -p, --polygon=POLYGON_FILE
79              Set the polygon to cut out based on the contents  of  the  file.
80              The  file has to be a GeoJSON, poly, or OSM file as described in
81              the (MULTI)POLYGON FILE FORMATS section.  It  has  to  have  the
82              right  suffix  to  be  detected correctly.  Can not be used with
83              --bbox/-b, --config/-c, or --directory/-d.
84
85       -s, --strategy=STRATEGY
86              Use the given strategy to extract the region.  For possible val‐
87              ues  and  details  see the STRATEGIES section.  Default is “com‐
88              plete_ways”.
89
90       -S, --option=OPTION=VALUE
91              Set a named option for the strategy.  If needed you can  specify
92              this option multiple times to set several options.
93
94       --set-bounds
95              Set  the  bounds field in the header.  The bounds are set to the
96              bbox or envelope of the polygon specified for the extract.  Note
97              that  strategies other than “simple” can put nodes outside those
98              bounds into the output file.
99

COMMON OPTIONS

101       -h, --help
102              Show usage help.
103
104       -v, --verbose
105              Set verbose mode.  The program  will  output  information  about
106              what it is doing to STDERR.
107

INPUT OPTIONS

109       -F, --input-format=FORMAT
110              The  format  of the input file(s).  Can be used to set the input
111              format if it can’t be autodetected from the file name(s).   This
112              will  set the format for all input files, there is no way to set
113              the format for some  input  files  only.   See  osmium-file-for‐
114              mats(5) or the libosmium manual for details.
115

OUTPUT OPTIONS

117       -f, --output-format=FORMAT
118              The  format  of  the output file.  Can be used to set the output
119              file format if it can’t be autodetected  from  the  output  file
120              name.   See  osmium-file-formats(5)  or the libosmium manual for
121              details.
122
123       --fsync
124              Call fsync after writing the output file to force flushing  buf‐
125              fers to disk.
126
127       --generator=NAME
128              The  name and version of the program generating the output file.
129              It will be added to the header of the output file.   Default  is
130osmium/” and the version of osmium.
131
132       -o, --output=FILE
133              Name of the output file.  Default is `-' (STDOUT).
134
135       -O, --overwrite
136              Allow  an  existing output file to be overwritten.  Normally os‐
137              mium will refuse to write over an existing file.
138
139       --output-header=OPTION=VALUE
140              Add output header option.  This command line option can be  used
141              multiple  times for different OPTIONs.  See the libosmium manual
142              for a list of available header options.  For some  commands  you
143              can  use  the special format “OPTION!” (ie.  an exclamation mark
144              after the OPTION and no value set) to set the value to the  same
145              as in the input file.
146

CONFIG FILE

148       The  config file mainly specifies the file names and the regions of the
149       extracts that should be created.
150
151       The config file is in JSON format.  The top-level is  an  object  which
152       contains  at  least an “extracts” array.  It can also contain a “direc‐
153       tory” entry which names the directory where all the output  files  will
154       be created:
155
156              {
157                  "extracts": [...],
158                  "directory": "/tmp/"
159              }
160
161       The extracts array specifies the extracts that should be created.  Each
162       item in the array is an object with at least a name “output” naming the
163       output  file and a region defined in a “bbox”, “polygon” or “multipoly‐
164       gon” name.  An optional “description” can be added, it will not be used
165       by  the  program  but can help with documenting the file contents.  You
166       can add an optional “output_format” if the format can not  be  detected
167       from  the  “output” file name.  Run “osmium help file-formats” to get a
168       description of allowed formats.
169
170       The optional “output_header” allows you  to  set  additional  OSM  file
171       header  settings  such  as  the “generator”.  If you set the value of a
172       file header setting to null, the output header will be set to the  same
173       header from the input file.
174
175              "extracts": [
176                  {
177                      "output": "hamburg.osm.pbf",
178                      "output_format": "pbf",
179                      "description": "optional description",
180                      "bbox": ...
181                  },
182                  {
183                      "output": "berlin.osm.pbf",
184                      "description": "optional description",
185                      "polygon": ...
186                  },
187                  {
188                      "output": "munich.osm.pbf",
189                      "output_header": {
190                          "generator": "MyExtractor/1.0",
191                          "osmosis_replication_timestamp": null
192                      },
193                      "description": "optional description",
194                      "multipolygon": ...
195                  }
196              ]
197
198       There are several formats for specifying the regions:
199
200       bbox:
201
202       A bounding box in one of two formats.  The first is a simple array with
203       four real numbers, the first two specifying the coordinates of an arbi‐
204       trary corner, the second two specifying the coordinates of the opposite
205       corner.
206
207              {
208                  "output": "munich.osm.pbf",
209                  "description": "Bounding box specified in array format",
210                  "bbox": [11.35, 48.05, 11.73, 48.25]
211              }
212
213       The second format uses an object instead of an array:
214
215              {
216                  "output": "dresden.osm.pbf",
217                  "description": "Bounding box specified in object format",
218                  "bbox": {
219                      "left": 13.57,
220                      "right": 13.97,
221                      "top": 51.18,
222                      "bottom": 50.97
223                  }
224              }
225
226       polygon:
227
228       A polygon, either specified inline in the config file or read  from  an
229       external  file.  See the (MULTI)POLYGON FILE FORMATS section for exter‐
230       nal files.  If specified inline this is a nested array, the outer array
231       defining the polygon, the next array the rings and the innermost arrays
232       the coordinates.  This format is the same as in GeoJSON files.
233
234       In this example there is only one outer ring:
235
236              "polygon": [[
237                  [9.613465, 53.58071],
238                  [9.647599, 53.59655],
239                  [9.649288, 53.61059],
240                  [9.613465, 53.58071]
241              ]]
242
243       In each ring, the last set of coordinates should be  the  same  as  the
244       first set, closing the ring.
245
246       multipolygon:
247
248       A multipolygon, either specified inline in the config file or read from
249       an external file.  See the (MULTI)POLYGON FILE FORMATS section for  ex‐
250       ternal  files.   If  specified inline this is a nested array, the outer
251       array defining the multipolygon, the next array the polygons, the  next
252       the rings and the innermost arrays the coordinates.  This format is the
253       same as in GeoJSON files.
254
255       In this example there is one outer and one inner ring:
256
257              "multipolygon": [[[
258                  [6.847, 50.987],
259                  [6.910, 51.007],
260                  [7.037, 50.953],
261                  [6.967, 50.880],
262                  [6.842, 50.925],
263                  [6.847, 50.987]
264              ],[
265                  [6.967, 50.954],
266                  [6.969, 50.920],
267                  [6.932, 50.928],
268                  [6.934, 50.950],
269                  [6.967, 50.954]
270              ]]]
271
272       In each ring, the last set of coordinates should be  the  same  as  the
273       first set, closing the ring.
274
275       Osmium must check each and every node in the input data and find out in
276       which bounding boxes or (multi)polygons this node  is.   This  is  very
277       cheap  for bounding boxes, but more expensive for (multi)polygons.  And
278       it becomes more expensive the more vertices the (multi)polyon has.  Use
279       bounding boxes or simplified polygons where possible.
280
281       Note that bounding boxes or (multi)polygons are not allowed to span the
282       -180/180 degree line.  If you need this, cut out the  regions  on  each
283       side and use osmium merge to join the resulting files.
284

(MULTI)POLYGON FILE FORMATS

286       External  files describing a (multi)polygon are specified in the config
287       file using the “file_name” and “file_type” properties on the  “polygon”
288       or “multipolygon” object:
289
290              "polygon": {
291                  "file_name": "berlin.geojson",
292                  "file_type": "geojson"
293              }
294
295       If  file names don’t start with a slash (/), they are interpreted rela‐
296       tive to the directory where the config file is.  If the “file_type”  is
297       missing,  Osmium  will  try  to  autodetect  it  from the suffix of the
298       “file_name”.
299
300       The following file types are supported:
301
302       geojson
303              GeoJSON file containing exactly one Feature of type  Polygon  or
304              MultiPolygon,  or  a FeatureCollection with the first Feature of
305              type Polygon or MultiPolygon.  Everything except the actual  ge‐
306              ometry (of the first Feature) is ignored.
307
308       poly   A    poly    file    as    described    in    https://wiki.open
309              streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format  .    This
310              wiki page also mentions several sources for such poly files.
311
312       osm    An  OSM file containing one or more multipolygon or boundary re‐
313              lation together with all the nodes and  ways  needed.   Any  OSM
314              file  format  (XML,  PBF,  ...)  supported by Osmium can be used
315              here, but the correct suffix must be used, so the file format is
316              detected  correctly.   Files  for this can easily be obtained by
317              searching for the area on OSM and then downloading the full  re‐
318              lation      using      a      URL     like     https://www.open
319              streetmap.org/api/0.6/relation/RELATION-ID/full .   Or  you  can
320              use osmium getid -r to get a specific relation from an OSM file.
321              Note that both these approaches can get you very detailed bound‐
322              aries which can take quite a while to cut out.  Consider simpli‐
323              fying the boundary before use.
324
325       If there are several (multi)polygons in a poly file or OSM  file,  they
326       will  be  merged.   The (multi)polygons must not overlap, otherwise the
327       result is undefined.
328

STRATEGIES

330       osmium extract can use different strategies for creating the  extracts.
331       Depending  on  the  strategy  different  objects will end up in the ex‐
332       tracts.  The strategies differ in how much memory they need and how of‐
333       ten  they  need to read the input file.  The choice of strategy depends
334       on how you want to use the generated extracts and how much  memory  and
335       time you have.
336
337       The default strategy is complete_ways.
338
339       Strategy simple
340              Runs  in  a single pass.  The extract will contain all nodes in‐
341              side the region and all ways referencing those nodes as well  as
342              all  relations  referencing  any nodes or ways already included.
343              Ways crossing the region boundary  will  not  be  reference-com‐
344              plete.  Relations will not be reference-complete.  This strategy
345              is fast, because it reads the input only once, but the result is
346              not  enough  for  most  use cases.  It is the only strategy that
347              will work when reading from a socket  or  pipe.   This  strategy
348              will not work for history files.
349
350       Strategy complete_ways
351              Runs  in  two passes.  The extract will contain all nodes inside
352              the region and all ways referencing those nodes as well  as  all
353              nodes  referenced  by those ways.  The extract will also contain
354              all relations referenced by nodes inside the region or ways  al‐
355              ready  included  and,  recursively, their parent relations.  The
356              ways are reference-complete, but the relations are not.
357
358       Strategy smart
359              Runs in three passes.  The extract will contain all nodes inside
360              the  region  and all ways referencing those nodes as well as all
361              nodes referenced by those ways.  The extract will  also  contain
362              all  relations referenced by nodes inside the region or ways al‐
363              ready included and, recursively, their  parent  relations.   The
364              extract will also contain all nodes and ways (and the nodes they
365              reference) referenced by  relations  tagged  “type=multipolygon”
366              directly referencing any nodes in the region or ways referencing
367              nodes in the region.  The ways are reference-complete,  and  all
368              multipolygon  relations referencing nodes in the regions or ways
369              that have nodes in the region are reference-complete.  Other re‐
370              lations are not reference-complete.
371
372       For  the  smart  strategy  you  can  change the types of relations that
373       should  be  reference-complete.   Instead  of  just  relations   tagged
374       “type=multipolygon”,   you  can  either  get  all  relations  (use  “-S
375       types=any”) or give a list of types to the -S option: “-S  types=multi‐
376       polygon,route”.   Note  that especially boundary relations can be huge,
377       so if you include them, be aware your result might be huge.
378
379       The smart strategy allows  another  option  “-S  complete-partial-rela‐
380       tions=X”.   If this is set, all relations that have more than X percent
381       of their members already in the extract will have  their  full  set  of
382       members  in the extract.  So this allows completing almost complete re‐
383       lations.  It can be useful for instance to make sure a  boundary  rela‐
384       tion is complete even if some of it is outside the polygon used for ex‐
385       traction.
386

DIAGNOSTICS

388       osmium extract exits with exit code
389
390       0      if everything went alright,
391
392       1      if there was an error processing the data, or
393
394       2      if there was a problem with the command line  arguments,  config
395              file or polygon files.
396

MEMORY USAGE

398       Memory usage of osmium extract depends on the number of extracts and on
399       the strategy used.  For the simple strategy it will  at  least  be  the
400       number  of  extracts  times the highest node ID used divided by 8.  For
401       the complete_ways twice that and for the smart strategy a bit more.
402
403       If you want to split a large file into many extracts, do this  in  sev‐
404       eral  steps.   First create several larger extracts and then split them
405       again and again into smaller pieces.
406

EXAMPLES

408       See the example config files in the  extract-example-config  directory.
409       To try it:
410
411              osmium extract -v -c extract-example-config/extracts.json \
412                  germany-latest.osm.pbf
413
414       Extract the city of Karlsruhe using a boundary polygon:
415
416              osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
417                  -o karlsruhe.osm.pbf
418
419       Extract the city of Munich using a bounding box:
420
421              osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
422                  -o munich.osm.pbf
423

SEE ALSO

425osmium(1), osmium-file-formats(5), osmium-getid(1), osmium-merge(1)
426
427       • Osmium website (https://osmcode.org/osmium-tool/)
428
430       Copyright (C) 2013-2021 Jochen Topf <jochen@topf.org>.
431
432       License  GPLv3+:  GNU  GPL  version  3  or  later  <https://gnu.org/li
433       censes/gpl.html>.  This is free software: you are free  to  change  and
434       redistribute it.  There is NO WARRANTY, to the extent permitted by law.
435

CONTACT

437       If  you  have  any  questions  or  want  to  report a bug, please go to
438       https://osmcode.org/contact.html
439

AUTHORS

441       Jochen Topf <jochen@topf.org>.
442
443
444
445                                    1.13.1                   OSMIUM-EXTRACT(1)
Impressum