maxxen's comments

maxxen | 6 months ago | on: MapLibre Tile: A next generation geospatial format optimized for rendering

This is cool. My only worry is that the implementation complexity will prevent widespread adoption outside of maplibre. Although getting write support upstreamed into PostGIS might be all thats needed to make sure it trickles down into all the different tile servers. MVT is not the most efficient, but everything speaks protobuf and you can hack together a parser in an afternoon.

I've experimented a lot with vectorized encodings of geometries in DuckDB-spatial using the different nested types. You definitely do get very good compression out of the box if you already support a bunch of specialized lightweight compression algorithms. Simpler geometric properties are very fast to compute (e.g. area, length), but for anything more complex you usually need to do some pre-processing or conversion into an intermediate data structure (like creating a line-segment index for intersection checks, or a node graph for clipping) which dominates the processing time anyway. The cost of materializing the columnar format into a row-wise format and back again when doing joins or sorting is absolutely brutal on performance too, compared to just keeping geometries as serialized blobs that are easy to slice and memcpy.

That said, I do expect columnar encoding to work really well for rendering in the browser, where transfer speed is the big bottleneck. The paper mentions Arrow as an inspiration, but I wonder why the format isn't just based on (compressed) arrow in its entirety? Im not super up to speed on the arrow ecosystem but I know there's a couple of query engines that don't just use it internally on the CPU, but also to execute on the GPU. If you are going to decode and send over the data to WebGL, you might as well do the filtering/expression evaluation there too no? (and leverage the existing techniques/code/interop in the arrow world)

maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade

I think this is just cause it hasn't been implemented in spatial yet. DuckDB is currently going through a pretty big refactor of the way we glob/scan/union multiple files with all the recent focus on data lake formats, but my plan is to get to it in spatial after next release when that part of the code has stabilized a bit.

maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade

Im inclined to agree, but unfortunately a huge amount of the existing data and processes in this space does not assume a spheroidal earth and come provided with a coordinate reference system. Ultimately there are also some domains where you got data that you explicitly don't want to interpret using spheroidal semantics, e.g. when working with a city plan - in which case the map _is_ the data model, and you definitely want the angles of a triangle to sum up to 180.

maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade

I replied to another comment, but I think a big part is that duckdbs spatial extension provides a SQL interface to a whole suite of standard foss gis packages by statically bundling everything (including inlining the default PROJ database of coordinate projection systems into the binary) and providing it for multiple platforms (including WASM). I.E there are no transitive dependencies except libc.

Yes, DuckDB does a whole lot more, vectorized larger-than-memory execution, columnar compressed storage and a ecosystem of other extensions that make it more than the sum of its parts. But while Ive been working hard on making the spatial extension more performant and more broadly useful (I designdd a new geometry engine this year, and spatial join optimization just got merged on the dev-branch), the fact that you can e.g. convert too and from a myriad of different geospatial formats by utilizing GDAL, transforming through SQL, or pulling down the latest overture dump without having the whole workflow break just cause you updated QGIS has probably been the main killer feature for a lot of the early adopters.

(Discmaimer, I work on duckdb-spatial @ duckdblabs)

maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade

I think a big part is that duckdbs spatial extension doesnt have any transitive dependencies (except libc). It statically packages the standard suite of foss gis tools (including a whole database of coordinate systems) for multiple platforms (including WASM) and provides a unified SQL interface to it all.

(Disclaimer, I work on duckdb-spatial @duckdblabs)

maxxen | 1 year ago | on: DuckDB 1.1.0 Released

Thanks! I've been wanting to add this since I first started out working on DuckDB almost two years ago but I finally managed to accumulate the time (and the skills required!) to finish it up over the summer. It still has a long way to go, support for indexes in extensions are pretty... raw, and we only push down constant filters into index scans (so no spatial index-join acceleration yet). But I think having a proper spatial index is one of those things that are kind of required to really elevate the spatial extension from being just a toy and I'm super stoked to work more on it during the next release cycle and all the new possibilities that it opens up.

maxxen | 2 years ago | on: Multi-database support in DuckDB

It does push down filters! Not sure if other nodes are pushed down, I think for now the database boundary is at the scan node, but you could add additional optimizer/rewrite rules in the postgres/sqlite/mysql extensions to push down other parts of the plans when applicable.

maxxen | 3 years ago | on: DuckDB 0.7.0

Yes! I'm working hard on it, I've been distracted with other work for some of our clients but hopefully have something to show soon.
page 1