top | item 20466196

H3: Uber’s Hexagonal Hierarchical Spatial Index (2018)

262 points| felipap | 6 years ago |eng.uber.com | reply

74 comments

order
[+] jillesvangurp|6 years ago|reply
I did some indexing using elasticsearch and some home grown stuff based on geohashes about seven years ago. At the time Elasticsearch was just adding support for geoshape indexing as well. Initially this was also based on geohashes. Later they added proper quad tree support (instead of indexing the geohash as a term), and recently they revamped the implementation using BKD trees. The current implementation is way faster and scalable than what they had a few years back and allegedly quite nice for handling complex GIS data.

What's the advantage of Uber's approach over any of this. Even my primitive geohash solution worked pretty nicely and you can implement it on just about any type of DB. I had a simple algorithm to cover any shape with geohashes of a certain size as well as a quick way to generate a polygon for a circle. The two combined allowed me to do radius searches for any shape overlapping or contained by the circle with simple terms queries on the geohash prefixes. My main headache was keeping the number of terms in a query (i.e. number of geohashes) to a reasonable size (below 1024, which if I recall was a the default limit).

[+] mtrovo|6 years ago|reply
They go into some detail on this talk https://youtu.be/ay2uwtRO3QE?t=712.

What I get from their explanation is that hexagon is a better shape for map grids because they are the most complex shape that can tesselate (the other two are triangles and squares). As they are more close to a circle, distances within a cell are more stable, also computing the distance from a cell center to its neighbours is stable in hexagons as well.

I think the reason hex is not more common is that subdivisions are hard to create compared to triangles or squares. Uber solved this by subdividing in 7 smaller hex and tilting it so they cover the bigger shape with some small overlap.

Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.

[+] sroussey|6 years ago|reply
geohashes are cool, and lopping off a letter reduces precision.

But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.

The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.

Too bad their stuff is not embeddable into an app. Know of lib for this?

[+] Mr_P|6 years ago|reply
I'd be curious to see a more thorough comparison to S2, which IMHO seems simpler (it's just quadtrees) and likely faster (it supports O(1) lookup vs. needing a hierarchical search).
[+] jandrewrogers|6 years ago|reply
It is complicated, and it depends on the use case. There are roughly three dimensions to what you are optimizing the representation for: presentation, computational geometry, and decomposition (sharding). S2 and H3 are both fundamentally cartography-driven representations, primarily optimizing for presentation. S2 focuses a bit more on sharding and H3 a bit more on computational geometry, there is quite a bit of literature on the tradeoffs of their characteristic designs. If the core application is not presentation driven, such as pure spatiotemporal analytics, neither of these representations are good choices.

Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.

[+] pininja|6 years ago|reply
H3 and S2 are kinda similar, except by using hexagons, H3 grid centroids are equidistant - rectangles have different distances from center to center.

H3 also has efficient means for finding a cell’s neighbors, and comes with some nice algorithms - like the “compact” fill. See https://uber.github.io/h3/#/documentation/overview/use-cases

The hierarchical search data is built into a grid’s ID in both H3 and S2, which helps when comparing ID’s to see how close they are to each other.

For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.

I work for Uber and have used H3, but don’t work on the library.

[+] kndjckt|6 years ago|reply
The critical advantage of h3 vs. s2 is that all neighbors are equidistant from the central cell. Also, the implementation of the projection means there is less distortion than s2.

I use this library every day and absolutely love it.

[+] seanhandley|6 years ago|reply
>it supports O(1) lookup vs. needing a hierarchical search

Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)

[+] erichocean|6 years ago|reply
Seeing as Uber used to use S2, presumably it was worse for their needs.
[+] kiwidrew|6 years ago|reply
Uber really likes hexagons. Their internal name for driver promotions is even "hexcentive"! I've long wondered what compelled Uber to use hexagons to define their various promotion boundaries, because it's not a very user-friendly choice -- the lines on a map look very jagged, for instance.

This post goes some way towards explaining why hexagons:

* "minimize the quantization error introduced when users move through a city"

* "allow us to approximate radiuses easily"

* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"

[+] traverseda|6 years ago|reply
>Hexagons are just pentagons without one side

My favorite quote in the video

[+] kjeetgill|6 years ago|reply
Don't they have one more side?
[+] hamandcheese|6 years ago|reply
How does this compare to S2 from google? It is not mentioned in the post at all.

http://s2geometry.io/

[+] seanhandley|6 years ago|reply
You get more aesthetic visuals from H3.

S2 cells distort quite heavily depending on which part of the globe you're mapping.

[+] snek|6 years ago|reply
There is a deep comparison in the video in the post.
[+] scrappyjoe|6 years ago|reply
If anyone wants to use this in R, the h3-r package works quite nicely. It’s in C which, in my tests, is an order of magnitude faster than the v8 implementation. We use it to chop up spatial data into equally sized chunks that we can use as observations for regressions and the like. https://github.com/crazycapivara/h3-r
[+] seanhandley|6 years ago|reply
I work for a delivery company. We currently use H3 to help calculate ETAs in realtime.
[+] lvh|6 years ago|reply
Does that work? I’d expect you want to do a fairly traditional road-based ETA? Otherwise you’ll be wildly off if you’re in a cul de sac next to a major highway or on said highway, or in massive traffic vs not at all.
[+] no_identd|6 years ago|reply
I hate Uber, but I now opened this issue suggesting a way to massively increase the performance of this anyway:

https://github.com/uber/h3/issues/258

(Note: by "massively" I mean pushing it to O(1) for the fundamental coordinate operations. So, perhaps read that as "most massively".)

[+] lacker|6 years ago|reply
I might be the only one pedantically annoyed here, but... the very first image on this page contains a mix of hexagons and pentagons.

Which is it - an inaccurate image, and the system is really hexagon-based, or does the system also use pentagons? My guess is that it is actually all hexagon-based since the overlap between granularity is already only approximate.

[+] nexuist|6 years ago|reply
Is UberCon going on or something? Why so many releases in the past few days? I'm not complaining; just curious.
[+] kyruzic|6 years ago|reply
This article is over a year old.
[+] drewm1980|6 years ago|reply
I wonder why healpix from NASA wasn't an option for them. Maybe the nonpermissive license...
[+] evrydayhustling|6 years ago|reply
Why does each cell contain seven finer cells instead of six, resulting in imperfect containment?
[+] wlesieutre|6 years ago|reply
You can't divide a hexagon into 6 regular hexagons, that would mean using triangular cells to subdivide the hexagon.

Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:

>Using a hexagon as the cell shape is critical for H3. As depicted in Figure 6, hexagons have only one distance between a hexagon centerpoint and its neighbors’, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.

As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:

>H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution. Hexagons cannot be perfectly subdivided into seven hexagons, so the finer cells are only approximately contained within a parent cell.

[+] flamtap|6 years ago|reply
I wonder if it has something to do with the number of hexes it takes to cover a sphere with? It seems to me like the larger hexes/pentagons are mostly just illustrative borders.
[+] pfortuny|6 years ago|reply
They are not all hexagons on the top picture: there is no hexagonal polyhedron (with regular hexagons). You can see pentagons here and there.
[+] the_seraphim|6 years ago|reply
because you can't do that, that's not how shapes work.
[+] Animats|6 years ago|reply
There's a lot to be said for a hexagonal grid. I looked at this once for a collision detection system. It's convenient to get rid of the special cases needed where four cells meet. You have to invent some new bookkeeping, but it can be worth it.
[+] jacobolus|6 years ago|reply
The aliasing / moiré artifacts are also dramatically less objectionable with hexagonal grids.

Now that image sensors and e.g. smartphone displays are very high resolution, and most content ends up going through multiple resampling routines on its way from capture -> storage/processing -> display, I would love to see people start to experiment with hexagon-grid cameras and displays. The slightly more complicated resampling wouldn’t really be a big deal for modern DSP hardware, and the visual output could be significantly better for the same pixel count.

[+] bayesian_horse|6 years ago|reply
I love the aesthetics of hexagons.

It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.

[+] jpab|6 years ago|reply
You can't make a sphere entirely out of hexagons; the math doesn't work [1]. In the article they note that they include 12 pentagons too, to solve this problem. Neatly, they arrange to have all the pentagons in areas of water where they presumably won't have to analyse traffic patterns for a while.

(I mention it mostly because I think it's an interesting little mathematical factoid.)

[1] https://math.stackexchange.com/q/2121175

[+] jeanlucas|6 years ago|reply
Second time seeing it, would love to use it just for fun, but no idea what to do