I did some indexing using elasticsearch and some home grown stuff based on geohashes about seven years ago. At the time Elasticsearch was just adding support for geoshape indexing as well. Initially this was also based on geohashes. Later they added proper quad tree support (instead of indexing the geohash as a term), and recently they revamped the implementation using BKD trees. The current implementation is way faster and scalable than what they had a few years back and allegedly quite nice for handling complex GIS data.
What's the advantage of Uber's approach over any of this. Even my primitive geohash solution worked pretty nicely and you can implement it on just about any type of DB. I had a simple algorithm to cover any shape with geohashes of a certain size as well as a quick way to generate a polygon for a circle. The two combined allowed me to do radius searches for any shape overlapping or contained by the circle with simple terms queries on the geohash prefixes. My main headache was keeping the number of terms in a query (i.e. number of geohashes) to a reasonable size (below 1024, which if I recall was a the default limit).
What I get from their explanation is that hexagon is a better shape for map grids because they are the most complex shape that can tesselate (the other two are triangles and squares). As they are more close to a circle, distances within a cell are more stable, also computing the distance from a cell center to its neighbours is stable in hexagons as well.
I think the reason hex is not more common is that subdivisions are hard to create compared to triangles or squares. Uber solved this by subdividing in 7 smaller hex and tilting it so they cover the bigger shape with some small overlap.
Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.
geohashes are cool, and lopping off a letter reduces precision.
But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.
The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.
Too bad their stuff is not embeddable into an app. Know of lib for this?
I'd be curious to see a more thorough comparison to S2, which IMHO seems simpler (it's just quadtrees) and likely faster (it supports O(1) lookup vs. needing a hierarchical search).
It is complicated, and it depends on the use case. There are roughly three dimensions to what you are optimizing the representation for: presentation, computational geometry, and decomposition (sharding). S2 and H3 are both fundamentally cartography-driven representations, primarily optimizing for presentation. S2 focuses a bit more on sharding and H3 a bit more on computational geometry, there is quite a bit of literature on the tradeoffs of their characteristic designs. If the core application is not presentation driven, such as pure spatiotemporal analytics, neither of these representations are good choices.
Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.
The critical advantage of h3 vs. s2 is that all neighbors are equidistant from the central cell. Also, the implementation of the projection means there is less distortion than s2.
I use this library every day and absolutely love it.
>it supports O(1) lookup vs. needing a hierarchical search
Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)
Uber really likes hexagons. Their internal name for driver promotions is even "hexcentive"! I've long wondered what compelled Uber to use hexagons to define their various promotion boundaries, because it's not a very user-friendly choice -- the lines on a map look very jagged, for instance.
This post goes some way towards explaining why hexagons:
* "minimize the quantization error introduced when users move through a city"
* "allow us to approximate radiuses easily"
* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"
If anyone wants to use this in R, the h3-r package works quite nicely. It’s in C which, in my tests, is an order of magnitude faster than the v8 implementation. We use it to chop up spatial data into equally sized chunks that we can use as observations for regressions and the like.
https://github.com/crazycapivara/h3-r
Does that work? I’d expect you want to do a fairly traditional road-based ETA? Otherwise you’ll be wildly off if you’re in a cul de sac next to a major highway or on said highway, or in massive traffic vs not at all.
I might be the only one pedantically annoyed here, but... the very first image on this page contains a mix of hexagons and pentagons.
Which is it - an inaccurate image, and the system is really hexagon-based, or does the system also use pentagons? My guess is that it is actually all hexagon-based since the overlap between granularity is already only approximate.
You can't divide a hexagon into 6 regular hexagons, that would mean using triangular cells to subdivide the hexagon.
Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:
>Using a hexagon as the cell shape is critical for H3. As depicted in Figure 6, hexagons have only one distance between a hexagon centerpoint and its neighbors’, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.
As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:
>H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution. Hexagons cannot be perfectly subdivided into seven hexagons, so the finer cells are only approximately contained within a parent cell.
I wonder if it has something to do with the number of hexes it takes to cover a sphere with? It seems to me like the larger hexes/pentagons are mostly just illustrative borders.
There's a lot to be said for a hexagonal grid. I looked at this once for a collision detection system. It's convenient to get rid of the special cases needed where four cells meet. You have to invent some new bookkeeping, but it can be worth it.
The aliasing / moiré artifacts are also dramatically less objectionable with hexagonal grids.
Now that image sensors and e.g. smartphone displays are very high resolution, and most content ends up going through multiple resampling routines on its way from capture -> storage/processing -> display, I would love to see people start to experiment with hexagon-grid cameras and displays. The slightly more complicated resampling wouldn’t really be a big deal for modern DSP hardware, and the visual output could be significantly better for the same pixel count.
It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.
You can't make a sphere entirely out of hexagons; the math doesn't work [1]. In the article they note that they include 12 pentagons too, to solve this problem. Neatly, they arrange to have all the pentagons in areas of water where they presumably won't have to analyse traffic patterns for a while.
(I mention it mostly because I think it's an interesting little mathematical factoid.)
[+] [-] jillesvangurp|6 years ago|reply
What's the advantage of Uber's approach over any of this. Even my primitive geohash solution worked pretty nicely and you can implement it on just about any type of DB. I had a simple algorithm to cover any shape with geohashes of a certain size as well as a quick way to generate a polygon for a circle. The two combined allowed me to do radius searches for any shape overlapping or contained by the circle with simple terms queries on the geohash prefixes. My main headache was keeping the number of terms in a query (i.e. number of geohashes) to a reasonable size (below 1024, which if I recall was a the default limit).
[+] [-] mtrovo|6 years ago|reply
What I get from their explanation is that hexagon is a better shape for map grids because they are the most complex shape that can tesselate (the other two are triangles and squares). As they are more close to a circle, distances within a cell are more stable, also computing the distance from a cell center to its neighbours is stable in hexagons as well.
I think the reason hex is not more common is that subdivisions are hard to create compared to triangles or squares. Uber solved this by subdividing in 7 smaller hex and tilting it so they cover the bigger shape with some small overlap.
Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.
[+] [-] sroussey|6 years ago|reply
But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.
The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.
Too bad their stuff is not embeddable into an app. Know of lib for this?
[+] [-] Mr_P|6 years ago|reply
[+] [-] jandrewrogers|6 years ago|reply
Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.
[+] [-] pininja|6 years ago|reply
H3 also has efficient means for finding a cell’s neighbors, and comes with some nice algorithms - like the “compact” fill. See https://uber.github.io/h3/#/documentation/overview/use-cases
The hierarchical search data is built into a grid’s ID in both H3 and S2, which helps when comparing ID’s to see how close they are to each other.
For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.
I work for Uber and have used H3, but don’t work on the library.
[+] [-] kndjckt|6 years ago|reply
I use this library every day and absolutely love it.
[+] [-] seanhandley|6 years ago|reply
Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)
[+] [-] erichocean|6 years ago|reply
[+] [-] Cactus2018|6 years ago|reply
[+] [-] kiwidrew|6 years ago|reply
This post goes some way towards explaining why hexagons:
* "minimize the quantization error introduced when users move through a city"
* "allow us to approximate radiuses easily"
* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"
[+] [-] traverseda|6 years ago|reply
My favorite quote in the video
[+] [-] kjeetgill|6 years ago|reply
[+] [-] hamandcheese|6 years ago|reply
http://s2geometry.io/
[+] [-] seanhandley|6 years ago|reply
S2 cells distort quite heavily depending on which part of the globe you're mapping.
[+] [-] snek|6 years ago|reply
[+] [-] dang|6 years ago|reply
[+] [-] scrappyjoe|6 years ago|reply
[+] [-] seanhandley|6 years ago|reply
[+] [-] lvh|6 years ago|reply
[+] [-] no_identd|6 years ago|reply
https://github.com/uber/h3/issues/258
(Note: by "massively" I mean pushing it to O(1) for the fundamental coordinate operations. So, perhaps read that as "most massively".)
[+] [-] lacker|6 years ago|reply
Which is it - an inaccurate image, and the system is really hexagon-based, or does the system also use pentagons? My guess is that it is actually all hexagon-based since the overlap between granularity is already only approximate.
[+] [-] nexuist|6 years ago|reply
[+] [-] kyruzic|6 years ago|reply
[+] [-] drewm1980|6 years ago|reply
[+] [-] evrydayhustling|6 years ago|reply
[+] [-] wlesieutre|6 years ago|reply
Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:
>Using a hexagon as the cell shape is critical for H3. As depicted in Figure 6, hexagons have only one distance between a hexagon centerpoint and its neighbors’, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.
As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:
>H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution. Hexagons cannot be perfectly subdivided into seven hexagons, so the finer cells are only approximately contained within a parent cell.
[+] [-] flamtap|6 years ago|reply
[+] [-] pfortuny|6 years ago|reply
[+] [-] the_seraphim|6 years ago|reply
[+] [-] Animats|6 years ago|reply
[+] [-] jacobolus|6 years ago|reply
Now that image sensors and e.g. smartphone displays are very high resolution, and most content ends up going through multiple resampling routines on its way from capture -> storage/processing -> display, I would love to see people start to experiment with hexagon-grid cameras and displays. The slightly more complicated resampling wouldn’t really be a big deal for modern DSP hardware, and the visual output could be significantly better for the same pixel count.
[+] [-] bayesian_horse|6 years ago|reply
It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.
[+] [-] jpab|6 years ago|reply
(I mention it mostly because I think it's an interesting little mathematical factoid.)
[1] https://math.stackexchange.com/q/2121175
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] starpilot|6 years ago|reply
[+] [-] seanhandley|6 years ago|reply
[+] [-] jeanlucas|6 years ago|reply
[+] [-] peter_d_sherman|6 years ago|reply