1OSMIUM-EXTRACT(1)                                            OSMIUM-EXTRACT(1)
2
3
4

NAME

6       osmium-extract - create geographical extracts from an OSM file
7

SYNOPSIS

9       osmium extract --config CONFIG-FILE [OPTIONS] OSM-FILE
10       osmium extract --bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
11       osmium extract --polygon POLYGON-FILE [OPTIONS] OSM-FILE
12

DESCRIPTION

14       Create  geographical  extracts  from an OSM data file or an OSM history
15       file.  The region (geographical extent) can be given as a bounding  box
16       or as a (multi)polygon.
17
18       There are three ways of calling this command:
19
20       • Specify a config file with the --config/-c option.  It can define any
21         number of regions you want to cut out.  See the CONFIG  FILE  section
22         for details.
23
24       • Specify a bounding box to cut out with the --bbox/-b option.
25
26       • Specify a (multi)polygon to cut out with the --polygon/-p option.
27
28       The  input  file  is  assumed  to  be ordered in the usual order: nodes
29       first, then ways, then relations.
30
31       If the --with-history/-H option is used, the  command  will  work  cor‐
32       rectly  for  history files.  This currently works for the complete_ways
33       strategy only.  The simple or smart strategies do not work with history
34       files.   A  history  extract  will contain every version of all objects
35       with at least one version in the region.  Generating a history  extract
36       is somewhat slower than a normal data extract.
37
38       Osmium will make sure that all nodes on the vertices of the boundary of
39       the region will be in the extract, but nodes that happen to be directly
40       on  the  boundary,  but between those vertices, might end up in the ex‐
41       tract or not.  In almost all cases this will be good enough, but if you
42       want  to make really sure you got everything, use a small buffer around
43       your region.
44
45       By default no bounds will be set in the header of the output file.  Use
46       the --set-bounds option if you need this.
47
48       Note  that osmium extract will never clip any OSM objects, ie.  it will
49       not remove node references outside the region from ways or unused rela‐
50       tion members from relations.  This means you might get objects that are
51       not reference-complete.  It has the advantage that you can  use  osmium
52       merge to merge several extracts without problems.
53

OPTIONS

55       -b, --bbox=LONG1,LAT1,LONG2,LAT2
56              Set  the  bounding box to cut out.  Can not be used with --poly‐
57              gon/-p,  --config/-c,  or   --directory/-d.    The   coordinates
58              LONG1,LAT1  are  from  one  arbitrary  corner,  the  coordinates
59              LONG2,LAT2 are from the opposite corner.
60
61       -c, --config=FILE
62              Set the name of the config file.   Can  not  be  used  with  the
63              --bbox/-b  or  --polygon/-p  option.  If this is set, the --out‐
64              put/-o and --output-format/-f options are ignored, because  they
65              are set in the config file.
66
67       --clean=ATTR
68              Clean  the attribute (version, timestamp, changeset, uid, user),
69              from the data before writing it out again.  The  attribute  will
70              be  set  to  0 (the user will be set to the empty string).  This
71              option can be given multiple times.   Depending  on  the  output
72              format  these  attributes  might  show up as 0 or not show up at
73              all.
74
75       -d, --directory=DIRECTORY
76              Output directory.  Output file names in the config file are rel‐
77              ative  to  this  directory.   Overwrites the setting of the same
78              name in the config  file.   This  option  is  ignored  when  the
79              --bbox/-b  or  --polygon/-p options are used, set the output di‐
80              rectory and name with the --output/-o option in that case.
81
82       -H, --with-history
83              Specify that the input file  is  a  history  file.   The  output
84              file(s) will also be history file(s).
85
86       -p, --polygon=POLYGON_FILE
87              Set  the  polygon  to cut out based on the contents of the file.
88              The file has to be a GeoJSON, poly, or OSM file as described  in
89              the  (MULTI)POLYGON  FILE  FORMATS  section.  It has to have the
90              right suffix to be detected correctly.  Can  not  be  used  with
91              --bbox/-b, --config/-c, or --directory/-d.
92
93       -s, --strategy=STRATEGY
94              Use the given strategy to extract the region.  For possible val‐
95              ues and details see the STRATEGIES section.   Default  is  “com‐
96              plete_ways”.
97
98       -S, --option=OPTION=VALUE
99              Set  a named option for the strategy.  If needed you can specify
100              this option multiple times to set several options.
101
102       --set-bounds
103              Set the bounds field in the header.  The bounds are set  to  the
104              bbox or envelope of the polygon specified for the extract.  Note
105              that strategies other than “simple” can put nodes outside  those
106              bounds into the output file.
107

COMMON OPTIONS

109       -h, --help
110              Show usage help.
111
112       -v, --verbose
113              Set  verbose  mode.   The  program will output information about
114              what it is doing to STDERR.
115

INPUT OPTIONS

117       -F, --input-format=FORMAT
118              The format of the input file(s).  Can be used to set  the  input
119              format  if it can’t be autodetected from the file name(s).  This
120              will set the format for all input files, there is no way to  set
121              the  format  for  some  input  files only.  See osmium-file-for‐
122              mats(5) or the libosmium manual for details.
123

OUTPUT OPTIONS

125       -f, --output-format=FORMAT
126              The format of the output file.  Can be used to  set  the  output
127              file  format  if  it  can’t be autodetected from the output file
128              name.  See osmium-file-formats(5) or the  libosmium  manual  for
129              details.
130
131       --fsync
132              Call  fsync after writing the output file to force flushing buf‐
133              fers to disk.
134
135       --generator=NAME
136              The name and version of the program generating the output  file.
137              It  will  be added to the header of the output file.  Default is
138osmium/” and the version of osmium.
139
140       -o, --output=FILE
141              Name of the output file.  Default is `-' (STDOUT).
142
143       -O, --overwrite
144              Allow an existing output file to be overwritten.   Normally  os‐
145              mium will refuse to write over an existing file.
146
147       --output-header=OPTION=VALUE
148              Add  output header option.  This command line option can be used
149              multiple times for different OPTIONs.   See  the  osmium-output-
150              headers(5) man page for a list of available header options.  For
151              some commands you can use the special format “OPTION!” (ie.   an
152              exclamation  mark  after the OPTION and no value set) to set the
153              value to the same as in the input file.
154

CONFIG FILE

156       The config file mainly specifies the file names and the regions of  the
157       extracts that should be created.
158
159       The  config  file  is in JSON format.  The top-level is an object which
160       contains at least an “extracts” array.  It can also contain  a  “direc‐
161       tory”  entry  which names the directory where all the output files will
162       be created:
163
164              {
165                  "extracts": [...],
166                  "directory": "/tmp/"
167              }
168
169       The extracts array specifies the extracts that should be created.  Each
170       item in the array is an object with at least a name “output” naming the
171       output file and a region defined in a “bbox”, “polygon” or  “multipoly‐
172       gon” name.  An optional “description” can be added, it will not be used
173       by the program but can help with documenting the  file  contents.   You
174       can  add  an optional “output_format” if the format can not be detected
175       from the “output” file name.  Run “osmium help file-formats” to  get  a
176       description of allowed formats.
177
178       The  optional  “output_header”  allows  you  to set additional OSM file
179       header settings such as the “generator”.  If you set  the  value  of  a
180       file  header setting to null, the output header will be set to the same
181       header from the input file.
182
183              "extracts": [
184                  {
185                      "output": "hamburg.osm.pbf",
186                      "output_format": "pbf",
187                      "description": "optional description",
188                      "bbox": ...
189                  },
190                  {
191                      "output": "berlin.osm.pbf",
192                      "description": "optional description",
193                      "polygon": ...
194                  },
195                  {
196                      "output": "munich.osm.pbf",
197                      "output_header": {
198                          "generator": "MyExtractor/1.0",
199                          "osmosis_replication_timestamp": null
200                      },
201                      "description": "optional description",
202                      "multipolygon": ...
203                  }
204              ]
205
206       There are several formats for specifying the regions:
207
208       bbox:
209
210       A bounding box in one of two formats.  The first is a simple array with
211       four real numbers, the first two specifying the coordinates of an arbi‐
212       trary corner, the second two specifying the coordinates of the opposite
213       corner.
214
215              {
216                  "output": "munich.osm.pbf",
217                  "description": "Bounding box specified in array format",
218                  "bbox": [11.35, 48.05, 11.73, 48.25]
219              }
220
221       The second format uses an object instead of an array:
222
223              {
224                  "output": "dresden.osm.pbf",
225                  "description": "Bounding box specified in object format",
226                  "bbox": {
227                      "left": 13.57,
228                      "right": 13.97,
229                      "top": 51.18,
230                      "bottom": 50.97
231                  }
232              }
233
234       polygon:
235
236       A  polygon,  either specified inline in the config file or read from an
237       external file.  See the (MULTI)POLYGON FILE FORMATS section for  exter‐
238       nal files.  If specified inline this is a nested array, the outer array
239       defining the polygon, the next array the rings and the innermost arrays
240       the coordinates.  This format is the same as in GeoJSON files.
241
242       In this example there is only one outer ring:
243
244              "polygon": [[
245                  [9.613465, 53.58071],
246                  [9.647599, 53.59655],
247                  [9.649288, 53.61059],
248                  [9.613465, 53.58071]
249              ]]
250
251       In  each  ring,  the  last set of coordinates should be the same as the
252       first set, closing the ring.
253
254       multipolygon:
255
256       A multipolygon, either specified inline in the config file or read from
257       an  external file.  See the (MULTI)POLYGON FILE FORMATS section for ex‐
258       ternal files.  If specified inline this is a nested  array,  the  outer
259       array  defining the multipolygon, the next array the polygons, the next
260       the rings and the innermost arrays the coordinates.  This format is the
261       same as in GeoJSON files.
262
263       In this example there is one outer and one inner ring:
264
265              "multipolygon": [[[
266                  [6.847, 50.987],
267                  [6.910, 51.007],
268                  [7.037, 50.953],
269                  [6.967, 50.880],
270                  [6.842, 50.925],
271                  [6.847, 50.987]
272              ],[
273                  [6.967, 50.954],
274                  [6.969, 50.920],
275                  [6.932, 50.928],
276                  [6.934, 50.950],
277                  [6.967, 50.954]
278              ]]]
279
280       In  each  ring,  the  last set of coordinates should be the same as the
281       first set, closing the ring.
282
283       Osmium must check each and every node in the input data and find out in
284       which  bounding  boxes  or  (multi)polygons this node is.  This is very
285       cheap for bounding boxes, but more expensive for (multi)polygons.   And
286       it becomes more expensive the more vertices the (multi)polyon has.  Use
287       bounding boxes or simplified polygons where possible.
288
289       Note that bounding boxes or (multi)polygons are not allowed to span the
290       -180/180  degree  line.   If you need this, cut out the regions on each
291       side and use osmium merge to join the resulting files.
292

(MULTI)POLYGON FILE FORMATS

294       External files describing a (multi)polygon are specified in the  config
295       file  using the “file_name” and “file_type” properties on the “polygon”
296       or “multipolygon” object:
297
298              "polygon": {
299                  "file_name": "berlin.geojson",
300                  "file_type": "geojson"
301              }
302
303       If file names don’t start with a slash (/), they are interpreted  rela‐
304       tive  to the directory where the config file is.  If the “file_type” is
305       missing, Osmium will try to  autodetect  it  from  the  suffix  of  the
306       “file_name”.
307
308       The following file types are supported:
309
310       geojson
311              GeoJSON  file  containing exactly one Feature of type Polygon or
312              MultiPolygon, or a FeatureCollection with the first  Feature  of
313              type  Polygon or MultiPolygon.  Everything except the actual ge‐
314              ometry (of the first Feature) is ignored.
315
316       poly   A    poly    file    as    described    in    https://wiki.open
317              streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format   .   This
318              wiki page also mentions several sources for such poly files.
319
320       osm    An OSM file containing one or more multipolygon or boundary  re‐
321              lation  together  with  all  the nodes and ways needed.  Any OSM
322              file format (XML, PBF, ...) supported  by  Osmium  can  be  used
323              here, but the correct suffix must be used, so the file format is
324              detected correctly.  Files for this can easily  be  obtained  by
325              searching  for the area on OSM and then downloading the full re‐
326              lation     using     a      URL      like      https://www.open
327              streetmap.org/api/0.6/relation/RELATION-ID/full  .   Or  you can
328              use osmium getid -r to get a specific relation from an OSM file.
329              Note that both these approaches can get you very detailed bound‐
330              aries which can take quite a while to cut out.  Consider simpli‐
331              fying the boundary before use.
332
333       If  there  are several (multi)polygons in a poly file or OSM file, they
334       will be merged.  The (multi)polygons must not  overlap,  otherwise  the
335       result is undefined.
336

STRATEGIES

338       osmium  extract can use different strategies for creating the extracts.
339       Depending on the strategy different objects will  end  up  in  the  ex‐
340       tracts.  The strategies differ in how much memory they need and how of‐
341       ten they need to read the input file.  The choice of  strategy  depends
342       on  how  you want to use the generated extracts and how much memory and
343       time you have.
344
345       The default strategy is complete_ways.
346
347       Strategy simple
348              Runs in a single pass.  The extract will contain all  nodes  in‐
349              side  the region and all ways referencing those nodes as well as
350              all relations referencing any nodes or  ways  already  included.
351              Ways  crossing  the  region  boundary will not be reference-com‐
352              plete.  Relations will not be reference-complete.  This strategy
353              is fast, because it reads the input only once, but the result is
354              not enough for most use cases.  It is  the  only  strategy  that
355              will  work  when  reading  from a socket or pipe.  This strategy
356              will not work for history files.
357
358       Strategy complete_ways
359              Runs in two passes.  The extract will contain all  nodes  inside
360              the  region  and all ways referencing those nodes as well as all
361              nodes referenced by those ways.  The extract will  also  contain
362              all  relations referenced by nodes inside the region or ways al‐
363              ready included and, recursively, their  parent  relations.   The
364              ways are reference-complete, but the relations are not.
365
366       Strategy smart
367              Runs in three passes.  The extract will contain all nodes inside
368              the region and all ways referencing those nodes as well  as  all
369              nodes  referenced  by those ways.  The extract will also contain
370              all relations referenced by nodes inside the region or ways  al‐
371              ready  included  and,  recursively, their parent relations.  The
372              extract will also contain all nodes and ways (and the nodes they
373              reference)  referenced  by  relations tagged “type=multipolygon”
374              directly referencing any nodes in the region or ways referencing
375              nodes  in  the region.  The ways are reference-complete, and all
376              multipolygon relations referencing nodes in the regions or  ways
377              that have nodes in the region are reference-complete.  Other re‐
378              lations are not reference-complete.
379
380       For the complete_ways  strategy  you  can  set  the  option  “-S  rela‐
381       tions=false”  in  which case no relations will be written to the output
382       file.
383
384       For the smart strategy you can  change  the  types  of  relations  that
385       should   be  reference-complete.   Instead  of  just  relations  tagged
386       “type=multipolygon”,  you  can  either  get  all  relations  (use   “-S
387       types=any”)  or give a list of types to the -S option: “-S types=multi‐
388       polygon,route”.  Note that especially boundary relations can  be  huge,
389       so if you include them, be aware your result might be huge.
390
391       The  smart  strategy  allows  another option “-S complete-partial-rela‐
392       tions=X”.  If this is set, all relations that have more than X  percent
393       of  their  members  already  in the extract will have their full set of
394       members in the extract.  So this allows completing almost complete  re‐
395       lations.   It  can be useful for instance to make sure a boundary rela‐
396       tion is complete even if some of it is outside the polygon used for ex‐
397       traction.
398

DIAGNOSTICS

400       osmium extract exits with exit code
401
402       0      if everything went alright,
403
404       1      if there was an error processing the data, or
405
406       2      if  there  was a problem with the command line arguments, config
407              file or polygon files.
408

MEMORY USAGE

410       Memory usage of osmium extract depends on the number of extracts and on
411       the  strategy  used.   For  the simple strategy it will at least be the
412       number of extracts times the highest node ID used divided  by  8.   For
413       the complete_ways twice that and for the smart strategy a bit more.
414
415       If  you  want to split a large file into many extracts, do this in sev‐
416       eral steps.  First create several larger extracts and then  split  them
417       again and again into smaller pieces.
418

EXAMPLES

420       See  the  example config files in the extract-example-config directory.
421       To try it:
422
423              osmium extract -v -c extract-example-config/extracts.json \
424                  germany-latest.osm.pbf
425
426       Extract the city of Karlsruhe using a boundary polygon:
427
428              osmium extract -p karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
429                  -o karlsruhe.osm.pbf
430
431       Extract the city of Munich using a bounding box:
432
433              osmium extract -b 11.35,48.05,11.73,48.25 germany-latest.osm.pbf \
434                  -o munich.osm.pbf
435

SEE ALSO

437osmium(1), osmium-file-formats(5), osmium-output-headers(5),  osmium-
438         getid(1), osmium-merge(1)
439
440       • Osmium website (https://osmcode.org/osmium-tool/)
441
443       Copyright (C) 2013-2022 Jochen Topf <jochen@topf.org>.
444
445       License  GPLv3+:  GNU  GPL  version  3  or  later  <https://gnu.org/li
446       censes/gpl.html>.  This is free software: you are free  to  change  and
447       redistribute it.  There is NO WARRANTY, to the extent permitted by law.
448

CONTACT

450       If  you  have  any  questions  or  want  to  report a bug, please go to
451       https://osmcode.org/contact.html
452

AUTHORS

454       Jochen Topf <jochen@topf.org>.
455
456
457
458                                    1.14.0                   OSMIUM-EXTRACT(1)
Impressum