Logo

dev-resources.site

for different kinds of informations.

More on Geohashing: Covering and an updated DynamoDB library

Published at
2/5/2020
Categories
geo
guide
dynamodb
Author
phm200
Categories
3 categories in total
geo
open
guide
open
dynamodb
open
Author
6 person written this
phm200
open
More on Geohashing: Covering and an updated DynamoDB library

This is the third part in a series of posts. See the links above to jump to the prior posts

S2 vs. H3 Covering

In my last post I talked through how we can use S2 cells to fill or cover a shape on the map. In this case a circle, within which we want to search for items of interest. I also mentioned that Uber came up with H3 cells, a technique for covering a shape on the map with hexagons.

The images below show S2 and H3 coverings of the same area in Washington, DC.

S2 Covering (shown in blue)
S2 Covering downtown DC

H3 Covering (shown in red)
H3 Covering downtown DC

The S2 covering is composed of cells of different sizes or levels. Some are larger cells that fit mostly into the circle, others are smaller cells that help cover the edges. This means fewer cells overall are needed to cover the space vs the H3 example, albeit with the downside of a more uneven covering compared to what we see in H3.

With that somewhat uneven covering, we can see how important that distance calculation is in the final step of our S2 geo-search from the prior post. It excludes those points of interest that came from the outer edges of covering cells.

These images also illustrate how S2 cells can nest. Any given S2 cell will fit completely into its parent cell at a higher level. S2 cells nesting allow us to calculate coverings with diverse cell sizes. Nesting also would allow us to roll up location based data into larger aggregates. If data is stored at a small (high level) cell, then we can roll up that data to a neighborhood, city or region level by summing up the data from all the child cells in a bigger cell that covers the target area.

In contrast while H3 provides a more precise covering (at least a high resolution), hexagons do not nest. As we scale up levels, the hexagons overlap incompletely. So we cannot have a meaningful H3 covering with diverse hexagon sizes. Without nesting, aggregation is not as automatic. While we can still roll up data from the smaller hexagons, we cannot map it directly to larger ones.

Dash-Labs dynamodb-geo

While watching a brilliant discussion on Twitch between DynamoDB gurus Rick Houlihan and Alex DeBrie, at around an hour in, Rick dropped a quick reference to an AWS customer who had taken the DynamoDB Geo library I looked at last post and improved upon it. Thanks to twitch user switch184 in the comments for pointing me to the repo at: https://github.com/Dash-Labs/dynamodb-geo

#Geo Library for Amazon DynamoDB

Build Status

This library was forked from the AWS geo library.

Following limitations with the aws geo library were the main reasons that necessitated this fork:

  • Usage required a table’s hash and range key to be replaced by geo data. This approach is not feasible as it cannot be used when performing user-centric queries, where the hash and range key have to be domain model specific attributes.
  • Developed prior to GSI, hence only used LSI
  • No solution for composite queries. For e.g. “Find something within X miles of lat/lng AND category=‘restaurants’;
  • The solution executed the queries and returned the final result. It did not provide the client with any control over the query execution.

What methods are available for geo-querying in this library?

  • Query for a given lat/long
  • Radius query
  • Box/Rectangle query

All of the above queries can be run as composite queries, depending on their…

While this fork of the original dynamodb-geo is in Java, not JavaScript, it is worth your time to take a look. There are many improvements, which are summarized in the repo's readme.

One bit of documentation that stuck out to me was more discussion on hash key length, including the number of queries the library produces for a radius search with a given hash key. Something that tripped me up at first is that the geohash key they are talking about here is the first X digits of an S2 cell id, and the length of that key does not match 1:1 to the level of the cell. As the Dash-Labs repo suggests, a 5 or 6 digit long geohash key is well suited for near proximity searches. I still struggle with understanding the math behind that assertion, but the sample query results are convincing.

Thanks for reading and happy geohashing!

Featured ones: