Importing OpenStreetMap Planet OSM with imposm3 and osmosis

A guide for importing Planet OSM into PostGIS

Posted by Tobias Begalke on Wed Sep 28 2016
In OpenStreetMap Data
Tags openstreetmap

Here’s a quick run-down of how I set up my own OpenStreetMap server using PostGIS, Osmosis and imposm3 on Arch Linux:

Installing PostgreSQL

We start by installing and initializing PostgreSQL and PostGIS. I use de_DE.UTF-8 as locale but you may pick any other valid locale, of course. I’m assuming that the operating system user you’ll use for this tutorial is osm.

> sudo pacman -S --noconfirm postgresql postgis
> sudo -u postgres initdb --locale de_DE.UTF-8 -E UTF8 -D '/var/lib/postgres/data'

I run OSM on a server with 12 cores and 64GB of RAM and 480GB SSD disks and for the initial import I use the following PosgreSQL settings in /var/lib/postgresql/postgresql.conf:

shared_buffers = 4GB
work_mem = 100MB
maintenance_work_mem = 4GB
effective_io_concurrency = 2
fsync = off
synchronous_commit = off
full_page_writes = off
effective_cache_size = 16GB
autovacuum = off

Next, we’ll add a user and a database that’s owned by the new user and we’ll enable the PostGIS and Hstore extensions for this database:

> sudo systemctl enable postgresql
> sudo systemctl start postgresql
> sudo -u postgres createuser -s osm
> sudo -u postgres createdb -E UTF-8 -l de_DE.UTF-8 -O osm osm
> sudo -u osm psql -c "create extension postgis;"
> sudo -u osm psql -c "create extension hstore;"

Downloading Planet OSM

Now we’re ready to download a copy of the latest Planet OSM file:

> mkdir ~/data
> cd ~/data
> wget http://planet.osm.org/pbf/planet-latest.osm.pbf

Setting up Imposm3

Imposm3 at this time is the fastest importer for OpenStreetMap data into PostgreSQL/PostGIS. It is written in Go and comes with a number of nifty optimizations to speed up the import process.

A full import takes around 6 hours on my server and about 10 hours with the diff feature enabled. I’ve done a full import with osm2pgsql once that took a whopping 10 days!

Here’s how to install imposm3 on your Arch Linux system. First you install Go, then set a few environment variables and finally tell go to install imposm3:

> sudo pacman -S --noconfirm go go-tools

Now put these variables in your ~/.bashrc:

export GOPATH="$HOME/go"
export PATH="$GOPATH/bin:$PATH"

In a new shell (to pick up the newly set variables) you can now install imposm3:

go get github.com/omniscale/imposm3

After a short while you’ll have imposm3 in ~/go/bin.

Configure imposm3

We’ll base our import in /home/osm/data and thus put the following JSON data into /home/osm/data/config.json:

{
    "cachedir": "/home/osm/data/imposm3_cache",
    "connection": "postgis://osm@localhost/osm",
    "mapping": "/home/osm/data/mapping.yml"
}

In mapping.yml imposm3 is told what to import to which table. The format is fully documented here. Here’s what my mapping.yml looks like:

tags:
  load_all: true
  exclude:
  - created_by
  - source
  - "tiger:*"

tables:
  admin:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - key: name
      name: name
      type: string
    - name: type
      type: mapping_value
    - key: admin_level
      name: admin_level
      type: integer
    - name: tags
      type: hstore_tags
    mapping:
      boundary:
      - administrative
    type: polygon
  airports:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - key: name
      name: name
      type: string
    - key: alt_name
      name: alt_name
      type: string
    - key: int_name
      name: int_name
      type: string
    - key: ele
      name: elevation
      type: integer
    - key: iata
      name: iata
      type: string
    - key: icao
      name: icao
      type: string
    - key: website
      name: website
      type: string
    - key: wikipedia
      name: wikipedia
      type: string
    - key: ifr
      name: ifr
      type: string
    - key: vfr
      name: vfr
      type: string
    mapping:
      aeroway:
      - aerodrome
      - terminal
      - helipad
    type: point
  amenities:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - key: name
      name: name
      type: string
    - name: type
      type: mapping_value
    - key: addr:street
      name: address
      type: string
    - key: addr:housenumber
      name: housenumber
      type: string
    - key: addr:postcode
      name: postcode
      type: string
    - key: addr:city
      name: city
      type: string
    - key: addr:housename
      name: housename
      type: string
    - key: opening_hours
      name: opening_hours
      type: string
    - key: phone
      name: phone
      type: string
    - key: website
      name: website
      type: string
    - name: tags
      type: hstore_tags
    mapping:
      amenity:
      - __any__
    type: point
  shops:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - key: name
      name: name
      type: string
    - name: type
      type: mapping_value
    - key: addr:street
      name: address
      type: string
    - key: addr:postcode
      name: postcode
      type: string
    - key: addr:housenumber
      name: housenumber
      type: string
    - key: addr:housename
      name: housename
      type: string
    - key: addr:city
      name: city
      type: string
    - key: opening_hours
      name: opening_hours
      type: string
    - key: phone
      name: phone
      type: string
    - key: cuisine
      name: cuisine
      type: string
    - key: website
      name: website
      type: string
    - key: service
      name: service
      type: string
    - key: shop
      name: shop
      type: string
    - key: brand
      name: brand
      type: string
    - key: operator
      name: operator
      type: string
    - key: wheelchair
      name: wheelchair
      type: string
    - name: tags
      type: hstore_tags
    mapping:
      shop:
      - __any__
    type: point
  places:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - key: name
      name: name
      type: string
    - name: type
      type: mapping_value
    - key: wikipedia
      name: wikipedia
      type: string
    - key: postal_code
      name: postal_code
      type: string
    - name: tags
      type: hstore_tags
    - args:
        values:
        - locality
        - suburb
        - hamlet
        - village
        - town
        - city
        - county
        - region
        - state
        - country
      name: z_order
      type: enumerate
    - key: population
      name: population
      type: integer
    mapping:
      place:
      - country
      - state
      - region
      - county
      - city
      - town
      - village
      - hamlet
      - suburb
      - locality
    type: point
  roads:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - name: type
      type: mapping_value
    - key: name
      name: name
      type: string
    - key: tunnel
      name: tunnel
      type: boolint
    - key: bridge
      name: bridge
      type: boolint
    - key: oneway
      name: oneway
      type: direction
    - key: ref
      name: ref
      type: string
    - key: layer
      name: z_order
      type: wayzorder
    - key: access
      name: access
      type: string
    - key: service
      name: service
      type: string
    - name: class
      type: mapping_key
    filters:
      exclude_tags:
      - - area
        - 'yes'
    mappings:
      railway:
        mapping:
          railway:
          - rail
          - tram
          - light_rail
          - subway
          - narrow_gauge
          - preserved
          - funicular
          - monorail
          - disused
      roads:
        mapping:
          highway:
          - motorway
          - motorway_link
          - trunk
          - trunk_link
          - primary
          - primary_link
          - secondary
          - secondary_link
          - tertiary
          - tertiary_link
          - road
          - path
          - track
          - service
          - footway
          - bridleway
          - cycleway
          - steps
          - pedestrian
          - living_street
          - unclassified
          - residential
          - raceway
          man_made:
          - pier
          - groyne
    type: linestring
  transport_areas:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - key: name
      name: name
      type: string
    - name: type
      type: mapping_value
    mapping:
      aeroway:
      - aerodrome
      - terminal
      - helipad
      - apron
      railway:
      - station
      - platform
    type: polygon
  transport_points:
    fields:
    - name: osm_id
      type: id
    - name: geometry
      type: geometry
    - key: name
      name: name
      type: string
    - name: type
      type: mapping_value
    - key: ref
      name: ref
      type: string
    mapping:
      aeroway:
      - aerodrome
      - terminal
      - helipad
      - gate
      highway:
      - motorway_junction
      - turning_circle
      - bus_stop
      railway:
      - station
      - halt
      - tram_stop
      - crossing
      - level_crossing
      - subway_entrance
    type: point

The Import

Compared to the steps that led us to this point the actual process of importing the Planet OSM file is rather mundane. I want to constantly update my OSM database so I choose the -diff option, which slows the initial import down a little.

> cd ~/data
> imposm3 import -config config.json -read planet-latest.osm.pbf -write -optimize -overwritecache -diff
> imposm3 import -config config.json -deployproduction

Depending on your hardware the full import will take around 10-12 hours to complete. At the time of writing this article the cache for the full import is roughly 50GB large and the database takes up another 50GB. When imposm3 imports OSM data it does so into the import schema and by running the last step above it moves the tables from the import schema to the production schema.

Keeping the data up to date

imposm3 can update the database from OSM Change files and osmosis will create these change files for us.

Install osmosis

There is an AUR for osmosis which we’ll use. The following steps are required to Install osmosis:

> sudo pacman -S jdk8-openjdk
> cd
> git clone https://aur.archlinux.org/osmosis.git
> cd osmosis
> makepkg -sri

Osmosis setup

Once the import is done you will find a file called last.state.txt in /home/osm/data/imposm3_cache. I use /home/osm/data/updates for updates and for osmosis to know where to start we’ll use last.state.txt as state.txt for osmosis:

> cp /home/osm/data/imposm3_cache/last.state.txt /home/osm/data/updates/state.txt

One more file to create for osmosis before we can update our data. The following goes into /home/osm/data/updates/configuration.txt:

# The URL of the directory containing change files.
baseUrl=http://planet.openstreetmap.org/replication/minute

# Defines the maximum time interval in seconds to download in a single invocation.
# Setting to 0 disables this feature.
maxInterval=0

This basically tells osmosis to pull all in changes since the state defined in state.txt at once.

I wrote a small shell script for the update process. Put the following commands in /home/osm/data/updates/update.sh and make sure update.sh is executable:

#!/bin/sh

(cd /home/osm/data/updates/ && \
osmosis --rri workingDirectory=. --wxc update.osc.gz && \
/home/osm/go/bin/imposm3 diff -quiet -config config.json update.osc.gz)

Update the data

By calling update.sh our OSM data will be brought up to the current state using the minutely change files OSM provides.

> chmod 744 update.sh
> ./update.sh

Updating the data regularly

To permanently keep my OSM data up to date I created a systemd timer that runs every five minutes. The following service configuration goes into /etc/systemd/system/osm-update.service:

[Unit]
Description=OSM Update Job

[Service]
Type=oneshot
User=osm
Group=osm
WorkingDirectory=/home/osm/data/updates
ExecStart=/home/osm/data/updates/update.sh

Finally, we create a timer by putting the following text into /etc/systemd/system/osm-update.timer:

[Unit]
Description=Run update-osm.service every 5 minutes

[Timer]
OnCalendar=*:0/5

[Install]
WantedBy=multi-user.target

Start the timer and everything is set:

> sudo systemctl start osm-update.timer

Photo Credits

The Cinematic Country by Trey Ratcliff (licensed under CC BY-NC-SA 2.0).