maxxen | 5 months ago | on: Vector database that can index 1B vectors in 48M
maxxen's comments
maxxen | 6 months ago | on: MapLibre Tile: A next generation geospatial format optimized for rendering
I've experimented a lot with vectorized encodings of geometries in DuckDB-spatial using the different nested types. You definitely do get very good compression out of the box if you already support a bunch of specialized lightweight compression algorithms. Simpler geometric properties are very fast to compute (e.g. area, length), but for anything more complex you usually need to do some pre-processing or conversion into an intermediate data structure (like creating a line-segment index for intersection checks, or a node graph for clipping) which dominates the processing time anyway. The cost of materializing the columnar format into a row-wise format and back again when doing joins or sorting is absolutely brutal on performance too, compared to just keeping geometries as serialized blobs that are easy to slice and memcpy.
That said, I do expect columnar encoding to work really well for rendering in the browser, where transfer speed is the big bottleneck. The paper mentions Arrow as an inspiration, but I wonder why the format isn't just based on (compressed) arrow in its entirety? Im not super up to speed on the arrow ecosystem but I know there's a couple of query engines that don't just use it internally on the CPU, but also to execute on the GPU. If you are going to decode and send over the data to WebGL, you might as well do the filtering/expression evaluation there too no? (and leverage the existing techniques/code/interop in the arrow world)
maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade
maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade
maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade
maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade
Yes, DuckDB does a whole lot more, vectorized larger-than-memory execution, columnar compressed storage and a ecosystem of other extensions that make it more than the sum of its parts. But while Ive been working hard on making the spatial extension more performant and more broadly useful (I designdd a new geometry engine this year, and spatial join optimization just got merged on the dev-branch), the fact that you can e.g. convert too and from a myriad of different geospatial formats by utilizing GDAL, transforming through SQL, or pulling down the latest overture dump without having the whole workflow break just cause you updated QGIS has probably been the main killer feature for a lot of the early adopters.
(Discmaimer, I work on duckdb-spatial @ duckdblabs)
maxxen | 10 months ago | on: DuckDB is probably the most important geospatial software of the last decade
(Disclaimer, I work on duckdb-spatial @duckdblabs)
maxxen | 1 year ago | on: DuckDB 1.1.0 Released
maxxen | 1 year ago | on: DuckDB 1.1.0 Released
maxxen | 2 years ago | on: Multi-database support in DuckDB
maxxen | 2 years ago | on: Multi-database support in DuckDB
maxxen | 2 years ago | on: DuckDB's AsOf Joins: Fuzzy Temporal Lookups
maxxen | 3 years ago | on: DuckDB 0.7.0
Disclaimer: I wrote duckdb-vss