1OSMIUM-EXTRACT(1)                                            OSMIUM-EXTRACT(1)
2
3
4

NAME

6       osmium-extract - create geographical extracts from an OSM file
7

SYNOPSIS

9       osmium extract --config CONFIG-FILE [OPTIONS] OSM-FILE
10       osmium extract --bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11       osmium extract --polygon POLYGON-FILE [OPTIONS] OSM-FILE
12

DESCRIPTION

14       Create  geographical  extracts  from an OSM data file or an OSM history
15       file.  The region (geographical extent) can be given as a bounding  box
16       or as a (multi)polygon.
17
18       There are three ways of calling this command:
19
20       · Specify a config file with the --config/-c option.  It can define any
21         number of regions you want to cut out.  See the CONFIG  FILE  section
22         for details.
23
24       · Specify a bounding box to cut out with the --bbox/-b option.
25
26       · Specify a (multi)polygon to cut out with the --polygon/-p option.
27
28       The  input  file  is  assumed  to  be ordered in the usual order: nodes
29       first, then ways, then relations.
30
31       If the --with-history/-H option is used, the  command  will  work  cor‐
32       rectly  for  history files.  This currently works for the complete_ways
33       strategy only.  The simple or smart strategies do not work with history
34       files.   A  history  extract  will contain every version of all objects
35       with at least one version in the region.  Generating a history  extract
36       is somewhat slower than a normal data extract.
37
38       Osmium will make sure that all nodes on the vertices of the boundary of
39       the region will be in the extract, but nodes that happen to be directly
40       on  the  boundary,  but  between  those  vertices,  might end up in the
41       extract or not.  In almost all cases this will be good enough,  but  if
42       you  want  to  make  really sure you got everything, use a small buffer
43       around your region.
44
45       By default no bounds will be set in the header of the output file.  Use
46       the --set-bounds option if you need this.
47
48       Note  that osmium extract will never clip any OSM objects, ie.  it will
49       not remove node references outside the region from ways or unused rela‐
50       tion members from relations.  This means you might get objects that are
51       not reference-complete.  It has the advantage that you can  use  osmium
52       merge to merge several extracts without problems.
53

OPTIONS

55       -b, --bbox=LONG1,LAT1,LONG2,LAT2
56              Set  the  bounding box to cut out.  Can not be used with --poly‐
57              gon/-p,  --config/-c,  or   --directory/-d.    The   coordinates
58              LONG1,LAT1  are  from  one  arbitrary  corner,  the  coordinates
59              LONG2,LAT2 are from the opposite corner.
60
61       -c, --config=FILE
62              Set the name of the config file.   Can  not  be  used  with  the
63              --bbox/-b  or  --polygon/-p  option.  If this is set, the --out‐
64              put/-o and --output-format/-f options are ignored, because  they
65              are set in the config file.
66
67       -d, --directory=DIRECTORY
68              Output directory.  Output file names in the config file are rel‐
69              ative to this directory.  Overwrites the  setting  of  the  same
70              name  in  the  config  file.   This  option  is ignored when the
71              --bbox/-b or --polygon/-p  options  are  used,  set  the  output
72              directory and name with the --output/-o option in that case.
73
74       -H, --with-history
75              Specify  that  the  input  file  is  a history file.  The output
76              file(s) will also be history file(s).
77
78       -p, --polygon=POLYGON_FILE
79              Set the polygon to cut out based on the contents  of  the  file.
80              The  file has to be a GeoJSON, poly, or OSM file as described in
81              the (MULTI)POLYGON FILE FORMATS section.  It  has  to  have  the
82              right  suffix  to  be  detected correctly.  Can not be used with
83              --bbox/-b, --config/-c, or --directory/-d.
84
85       -s, --strategy=STRATEGY
86              Use the given strategy to extract the region.  For possible val‐
87              ues  and  details  see the STRATEGIES section.  Default is “com‐
88              plete_ways”.
89
90       -S, --option=OPTION=VALUE
91              Set a named option for the strategy.  If needed you can  specify
92              this option multiple times to set several options.
93
94       --set-bounds
95              Set  the  bounds field in the header.  The bounds are set to the
96              bbox or envelope of the polygon specified for the extract.  Note
97              that  strategies other than “simple” can put nodes outside those
98              bounds into the output file.
99

COMMON OPTIONS

101       -h, --help
102              Show usage help.
103
104       -v, --verbose
105              Set verbose mode.  The program  will  output  information  about
106              what it is doing to STDERR.
107

INPUT OPTIONS

109       -F, --input-format=FORMAT
110              The  format  of the input file(s).  Can be used to set the input
111              format if it can’t be autodetected from the file name(s).   This
112              will  set the format for all input files, there is no way to set
113              the format for some  input  files  only.   See  osmium-file-for‐
114              mats(5) or the libosmium manual for details.
115

OUTPUT OPTIONS

117       -f, --output-format=FORMAT
118              The  format  of  the output file.  Can be used to set the output
119              file format if it can’t be autodetected  from  the  output  file
120              name.   See  osmium-file-formats(5)  or the libosmium manual for
121              details.
122
123       --fsync
124              Call fsync after writing the output file to force flushing  buf‐
125              fers to disk.
126
127       --generator=NAME
128              The  name and version of the program generating the output file.
129              It will be added to the header of the output file.   Default  is
130osmium/” and the version of osmium.
131
132       -o, --output=FILE
133              Name of the output file.  Default is `-' (STDOUT).
134
135       -O, --overwrite
136              Allow  an  existing  output  file  to  be overwritten.  Normally
137              osmium will refuse to write over an existing file.
138
139       --output-header=OPTION=VALUE
140              Add output header option.  This command line option can be  used
141              multiple  times for different OPTIONs.  See the libosmium manual
142              for a list of available header options.  For some  commands  you
143              can  use  the special format “OPTION!” (ie.  an exclamation mark
144              after the OPTION and no value set) to set the value to the  same
145              as in the input file.
146

CONFIG FILE

148       The  config file mainly specifies the file names and the regions of the
149       extracts that should be created.
150
151       The config file is in JSON format.  The top-level is  an  object  which
152       contains  at  least an “extracts” array.  It can also contain a “direc‐
153       tory” entry which names the directory where all the output  files  will
154       be created:
155
156              {
157                  "extracts": [...],
158                  "directory": "/tmp/"
159              }
160
161       The extracts array specifies the extracts that should be created.  Each
162       item in the array is an object with at least a name “output” naming the
163       output  file and a region defined in a “bbox”, “polygon” or “multipoly‐
164       gon” name.  An optional “description” can be added, it will not be used
165       by  the  program  but can help with documenting the file contents.  You
166       can add an optional “output_format” if the format can not  be  detected
167       from  the  “output” file name.  Run “osmium help file-formats” to get a
168       description of allowed formats.  The  optional  “output_header”  allows
169       you to set additional OSM file header settings such as the “generator”.
170
171              "extracts": [
172                  {
173                      "output": "hamburg.osm.pbf",
174                      "output_format": "pbf",
175                      "description": "optional description",
176                      "bbox": ...
177                  },
178                  {
179                      "output": "berlin.osm.pbf",
180                      "description": "optional description",
181                      "polygon": ...
182                  },
183                  {
184                      "output": "munich.osm.pbf",
185                      "output_header": {
186                          "generator": "MyExtractor/1.0"
187                      },
188                      "description": "optional description",
189                      "multipolygon": ...
190                  }
191              ]
192
193       There are several formats for specifying the regions:
194
195       bbox:
196
197       A bounding box in one of two formats.  The first is a simple array with
198       four real numbers, the first two specifying the coordinates of an arbi‐
199       trary corner, the second two specifying the coordinates of the opposite
200       corner.
201
202              {
203                  "output": "munich.osm.pbf",
204                  "description": "Bounding box specified in array format",
205                  "bbox": [11.35, 48.05, 11.73, 48.25]
206              }
207
208       The second format uses an object instead of an array:
209
210              {
211                  "output": "dresden.osm.pbf",
212                  "description": "Bounding box specified in object format",
213                  "bbox": {
214                      "left": 13.57,
215                      "right": 13.97,
216                      "top": 51.18,
217                      "bottom": 50.97
218                  }
219              }
220
221       polygon:
222
223       A polygon, either specified inline in the config file or read  from  an
224       external  file.  See the (MULTI)POLYGON FILE FORMATS section for exter‐
225       nal files.  If specified inline this is a nested array, the outer array
226       defining the polygon, the next array the rings and the innermost arrays
227       the coordinates.  This format is the same as in GeoJSON files.
228
229       In this example there is only one outer ring:
230
231              "polygon": [[
232                  [9.613465, 53.58071],
233                  [9.647599, 53.59655],
234                  [9.649288, 53.61059],
235                  [9.613465, 53.58071]
236              ]]
237
238       In each ring, the last set of coordinates should be  the  same  as  the
239       first set, closing the ring.
240
241       multipolygon:
242
243       A multipolygon, either specified inline in the config file or read from
244       an external file.  See the  (MULTI)POLYGON  FILE  FORMATS  section  for
245       external  files.  If specified inline this is a nested array, the outer
246       array defining the multipolygon, the next array the polygons, the  next
247       the rings and the innermost arrays the coordinates.  This format is the
248       same as in GeoJSON files.
249
250       In this example there is one outer and one inner ring:
251
252              "multipolygon": [[[
253                  [6.847, 50.987],
254                  [6.910, 51.007],
255                  [7.037, 50.953],
256                  [6.967, 50.880],
257                  [6.842, 50.925],
258                  [6.847, 50.987]
259              ],[
260                  [6.967, 50.954],
261                  [6.969, 50.920],
262                  [6.932, 50.928],
263                  [6.934, 50.950],
264                  [6.967, 50.954]
265              ]]]
266
267       In each ring, the last set of coordinates should be  the  same  as  the
268       first set, closing the ring.
269
270       Osmium must check each and every node in the input data and find out in
271       which bounding boxes or (multi)polygons this node  is.   This  is  very
272       cheap  for bounding boxes, but more expensive for (multi)polygons.  And
273       it becomes more expensive the more vertices the (multi)polyon has.  Use
274       bounding boxes or simplified polygons where possible.
275
276       Note that bounding boxes or (multi)polygons are not allowed to span the
277       -180/180 degree line.  If you need this, cut out the  regions  on  each
278       side and use osmium merge to join the resulting files.
279

(MULTI)POLYGON FILE FORMATS

281       External  files describing a (multi)polygon are specified in the config
282       file using the “file_name” and “file_type” properties on the  “polygon”
283       or “multipolygon” object:
284
285              "polygon": {
286                  "file_name": "berlin.geojson",
287                  "file_type": "geojson"
288              }
289
290       If  file names don’t start with a slash (/), they are interpreted rela‐
291       tive to the directory where the config file is.  If the “file_type”  is
292       missing,  Osmium  will  try  to  autodetect  it  from the suffix of the
293       “file_name”.
294
295       The following file types are supported:
296
297       geojson
298              GeoJSON file containing exactly one Feature of type  Polygon  or
299              MultiPolygon,  or  a FeatureCollection with the first Feature of
300              type Polygon or  MultiPolygon.   Everything  except  the  actual
301              geometry (of the first Feature) is ignored.
302
303       poly   A    poly    file    as    described    in    https://wiki.open
304              streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format  .    This
305              wiki page also mentions several sources for such poly files.
306
307       osm    An  OSM  file  containing  one  or more multipolygon or boundary
308              relation together with all the nodes and ways needed.   Any  OSM
309              file  format  (XML,  PBF,  ...)  supported by Osmium can be used
310              here, but the correct suffix must be used, so the file format is
311              detected  correctly.   Files  for this can easily be obtained by
312              searching for the area on OSM  and  then  downloading  the  full
313              relation      using     a     URL     like     https://www.open
314              streetmap.org/api/0.6/relation/RELATION-ID/full .   Or  you  can
315              use osmium getid -r to get a specific relation from an OSM file.
316              Note that both these approaches can get you very detailed bound‐
317              aries which can take quite a while to cut out.  Consider simpli‐
318              fying the boundary before use.
319
320       If there are several (multi)polygons in a poly file or OSM  file,  they
321       will  be  merged.   The (multi)polygons must not overlap, otherwise the
322       result is undefined.
323

STRATEGIES

325       osmium extract can use different strategies for creating the  extracts.
326       Depending  on  the  strategy  different  objects  will  end  up  in the
327       extracts.  The strategies differ in how much memory they need  and  how
328       often they need to read the input file.  The choice of strategy depends
329       on how you want to use the generated extracts and how much  memory  and
330       time you have.
331
332       The default strategy is complete_ways.
333
334       Strategy simple
335              Runs  in  a  single  pass.   The  extract will contain all nodes
336              inside the region and all ways referencing those nodes  as  well
337              as all relations referencing any nodes or ways already included.
338              Ways crossing the region boundary  will  not  be  reference-com‐
339              plete.  Relations will not be reference-complete.  This strategy
340              is fast, because it reads the input only once, but the result is
341              not  enough  for  most  use cases.  It is the only strategy that
342              will work when reading from a socket  or  pipe.   This  strategy
343              will not work for history files.
344
345       Strategy complete_ways
346              Runs  in  two passes.  The extract will contain all nodes inside
347              the region and all ways referencing those nodes as well  as  all
348              nodes  referenced  by those ways.  The extract will also contain
349              all relations referenced by nodes  inside  the  region  or  ways
350              already  included and, recursively, their parent relations.  The
351              ways are reference-complete, but the relations are not.
352
353       Strategy smart
354              Runs in three passes.  The extract will contain all nodes inside
355              the  region  and all ways referencing those nodes as well as all
356              nodes referenced by those ways.  The extract will  also  contain
357              all  relations  referenced  by  nodes  inside the region or ways
358              already included and, recursively, their parent relations.   The
359              extract will also contain all nodes and ways (and the nodes they
360              reference) referenced by  relations  tagged  “type=multipolygon”
361              directly referencing any nodes in the region or ways referencing
362              nodes in the region.  The ways are reference-complete,  and  all
363              multipolygon  relations referencing nodes in the regions or ways
364              that have nodes in the  region  are  reference-complete.   Other
365              relations are not reference-complete.
366
367       For  the  smart  strategy  you  can  change the types of relations that
368       should  be  reference-complete.   Instead  of  just  relations   tagged
369       “type=multipolygon”,   you  can  either  get  all  relations  (use  “-S
370       types=any”) or give a list of types to the -S option: “-S  types=multi‐
371       polygon,route”.   Note  that especially boundary relations can be huge,
372       so if you include them, be aware your result might be huge.
373
374       The smart strategy allows  another  option  “-S  complete-partial-rela‐
375       tions=X”.   If this is set, all relations that have more than X percent
376       of their members already in the extract will have  their  full  set  of
377       members  in  the  extract.   So  this allows completing almost complete
378       relations.  It can be useful for instance to make sure a boundary rela‐
379       tion  is  complete  even  if some of it is outside the polygon used for
380       extraction.
381

DIAGNOSTICS

383       osmium extract exits with exit code
384
385       0      if everything went alright,
386
387       1      if there was an error processing the data, or
388
389       2      if there was a problem with the command line  arguments,  config
390              file or polygon files.
391

MEMORY USAGE

393       Memory usage of osmium extract depends on the number of extracts and on
394       the strategy used.  For the simple strategy it will  at  least  be  the
395       number  of  extracts  times the highest node ID used divided by 8.  For
396       the complete_ways twice that and for the smart strategy a bit more.
397
398       If you want to split a large file into many extracts, do this  in  sev‐
399       eral  steps.   First create several larger extracts and then split them
400       again and again into smaller pieces.
401

EXAMPLES

403       See the example config files in the  extract-example-config  directory.
404       To try it:
405
406              osmium extract -v -c extract-example-config/extracts.json \
407                  germany-latest.osm.pbf
408
409       Extract the city of Karlsruhe using a boundary polygon:
410
411              osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
412                  -o karlsruhe.osm.pbf
413
414       Extract the city of Munich using a bounding box:
415
416              osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
417                  -o munich.osm.pbf
418

SEE ALSO

420       · osmium(1), osmium-file-formats(5), osmium-getid(1), osmium-merge(1)
421
422       · Osmium website (https://osmcode.org/osmium-tool/)
423
425       Copyright (C) 2013-2020 Jochen Topf <jochen@topf.org>.
426
427       License      GPLv3+:     GNU     GPL     version     3     or     later
428       <https://gnu.org/licenses/gpl.html>.  This is free  software:  you  are
429       free  to  change  and  redistribute  it.   There is NO WARRANTY, to the
430       extent permitted by law.
431

CONTACT

433       If you have any questions or  want  to  report  a  bug,  please  go  to
434       https://osmcode.org/contact.html
435

AUTHORS

437       Jochen Topf <jochen@topf.org>.
438
439
440
441                                    1.12.1                   OSMIUM-EXTRACT(1)
Impressum