Map

Today, I am going to look at geospatial data in Apache Druid in some more detail. Back in September, we learned how to ingest these data into Druid.

It turns out that a spatial dimension is in fact not much more than a string, in which x and y coordinate (or latitude and longitude) are separated by a comma. (You could also have more than two dimensions, but the filtering semantics are tailored for two dimensions.)

Data Generation

First, we need to generate some test data. I am using the Python Faker module in a small script like this:

import json
from faker import Faker

fake = Faker()

def main():

    print("latitude,longitude,place_name,country_code,timezone")

    for i in range(0, 10000):
        place = fake.location_on_land()
        print(','.join(place))

if __name__ == "__main__":
    main()

Ingest these data into Druid as described in my previous post.

Querying

Let’s start with a simple SQL query. I want a list of places with their coordinates, and how often they occurred in my file:

SELECT
  coordinates,
  place_name,
  COUNT(*)
FROM 
  geo_data_ww
GROUP BY 1, 2

In order to do my spatial magic, I need to convert this SQL code to a Druid native query. This is an easy interactive process in the Druid query console.

First, use the Explain function to show the Druid native query that is generated:

Explain

Then, use the Open Query button to open the native query in the editor, instead of the SQL.

Open Query

Spatial Filtering!

Now, let’s introduce a filter! I am going to try and select only the places within a rectangle that roughly contains my home country, Germany. So I want to cover an area from 47°N to 55°N latitude, and from 6°E to 15°E longitude.

Find the place in the query where it says "filter": null, and replace it by the following snippet:

  "filter": {
    "type": "spatial",
    "dimension": "coordinates",
    "bound": {
        "type": "rectangular",
        "minCoords": [47.0, 6.0],
        "maxCoords": [55.0, 15.0]
    }
  }

Filter

Note how the places that are listed in the result are almost all in Germany! You can also define circle and polygon filters. Spatial dimensions and filters are documented here.

Learnings

  • Spatial dimensions are encoded as strings.
  • Spatial filters are supported by the Druid native query language only.
  • You can use the Explain function in the Druid console to convert a SQL query into its Druid native equivalent.