Hilmar Buchta

PowerBI | PowerQuery

This blog post is about using XML data from OpenStreetMap in PowerQuery for example to create a local geocoding database. PowerQuery is just perfect for this purpose as it allows us to interactively shape the input data to the desired format.

First, we need to download some data from OpenStreetMap. You may go to the OpenStreetMap webpage, zoom to the desired area and then hit the export button. Alternatively, download links are provided for full country or region files after hitting ‘Export’.

image

The downloaded OSM file is in XML format and starts like this:

<?xml version=“1.0″ encoding=“UTF-8″?>
<osm version=“0.6″ generator=“CGImap 0.3.3 (733 thorn-01.openstreetmap.org)“ copyright=“OpenStreetMap and contributors“ attribution=“
http://www.openstreetmap.org/copyright“ license=“http://opendatacommons.org/licenses/odbl/1-0/“>

Now, open Excel and create a new PowerQuery from an XML file:

image

Next, select the downloaded file and hit Enter. PowerQuery loads with the file contents:

image

The data we’re interested in is stored in the nodes-table. So we can now click on the ‘Table’ link in the nodes column:

image

We now have to expand the ‘tag’ information:

image

Since we’re not interested in all the data from the file, we limit the information to country, city, postcode, street and housenumber by using the filter function:

image

We can now delete all columns but the following:

  • tag.attribute.k
  • tag.attribute.v
  • attribute:id
  • attribute:lat
  • attribute:long

image

image

This is pretty much the data we’d like to obtain from the OSM file. However, we still need to pivot to information from the rows into columns to see city, postcode etc. side by side with the geo coordinates. In order to do so, we select column tag.Attribut:k and use the Pivot function from the Transform ribbon as shown below:

image

Finally, we can rename the columns and filter for rows where country is not null:

image

The list now contains addresses (city, postcode, street, house number) together with the corresponding geographical coordinates (latitude, longitude).

For your reference, this is the script we created so far:

let
Source = Xml.Tables(File.Contents(„C:\Temp\download.osm“)),
node = Source{0}[node],
#“Changed Type“ = Table.TransformColumnTypes(node,{{„Attribute:id“, Int64.Type}, {„Attribute:visible“, type logical}, {„Attribute:version“, Int64.Type}, {„Attribute:changeset“, Int64.Type}, {„Attribute:timestamp“, type datetime}, {„Attribute:user“, type text}, {„Attribute:uid“, Int64.Type}, {„Attribute:lat“, type number}, {„Attribute:lon“, type number}}),
#“Expand tag“ = Table.ExpandTableColumn(#“Changed Type“, „tag“, {„Attribute:k“, „Attribute:v“}, {„tag.Attribute:k“, „tag.Attribute:v“}),
#“Filtered Rows“ = Table.SelectRows(#“Expand tag“, each ([#“tag.Attribute:k“] = „addr:city“ or [#“tag.Attribute:k“] = „addr:country“ or [#“tag.Attribute:k“] = „addr:housenumber“ or [#“tag.Attribute:k“] = „addr:postcode“ or [#“tag.Attribute:k“] = „addr:street“)),
#“Removed Columns“ = Table.RemoveColumns(#“Filtered Rows“,{„Attribute:visible“, „Attribute:version“, „Attribute:changeset“, „Attribute:timestamp“, „Attribute:user“, „Attribute:uid“}),
#“Pivoted Column“ = Table.Pivot(#“Removed Columns“, List.Distinct(#“Removed Columns“[#“tag.Attribute:k“]), „tag.Attribute:k“, „tag.Attribute:v“),
#“Renamed Columns“ = Table.RenameColumns(#“Pivoted Column“,{{„Attribute:id“, „id“}, {„Attribute:lat“, „latitude“}, {„Attribute:lon“, „longitude“}, {„addr:city“, „city“}, {„addr:postcode“, „postcode“}, {„addr:street“, „street“}, {„addr:country“, „country“}, {„addr:housenumber“, „housenumber“}}),
#“Filtered Rows1″ = Table.SelectRows(#“Renamed Columns“, each ([country] = „DE“))
in
#“Filtered Rows1″

The result can then be loaded to a PowerPivot data model (click on File, then Load to…):

image

After loading the data into the model, you can for example use PowerMaps to visualize the addresses on a map:

image

Of course, you could have received the similar result using the geocoding functionality of PowerMaps. Therefore, you simply need to add another column for the full address like this:

image

The result is pretty much the same but it takes a significant amount of time to geo code many addresses on this level of detail compared to the direct approach from above, which doesn’t need to pass geo coding web service because we already provided latitude and longitude.

image

So, if you have to geo code a lot of addresses, the OpenStreetMap XML format may be a lot of help. Using PowerQuery makes it easy to load this data into a PowerPivot data model. For larger regions make sure you have enough memory available in the local machine (and use the 64bit version of Microsoft Excel).